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PREFACE 


With  the  growing  dependance  of  aircraft  designers  on  analog  and  digital  flight  control  systems  to  provide  increased 
operational  capabilities  and  cost-effectiveness,  development  of  design,  implementation  and  test  techniques  which  can 
assure  the  integrity  of  such  systems  has  become  critical  to  NATO  mission  success.  This  ACjARDograph.  assembled  by 
the  Guidance  and  Control  Panel  of  AGARD,  brings  together  related  experience  in  the  NATO  community  as  a guide  for 
future  aircraft  developments. 

The  intent  of  the  AGARDograph  is  to  address  the  hardware  and  software  interface  aspects  of  reliable  flight  control 
systems.  Rapid  advances  in  solid-state  electronics  which  resulted  in  a hundred-fold  decrease  in  computer  size,  power  and 
cost  over  the  past  two  decades  have  revolutionized  the  design  of  modern  flight  control  systems.  Designers  have 
capitalized  on  these  gains  primarily  by  incorporating  additional  control  functions  to  improve  aircraft  or  weapon  system 
performance  and  survivability.  As  a result,  control  system  complexity  also  has  increased  by  1 to  2 orders  of  magnitude, 
and  highly-reliable  flight  control  system  operation  has  become  critically  important  to  mission  planning  and  execution. 
While  some  gains  in  system  reliability  were  obtained  through  redundancy  in  system  mechanization,  concerted  efforts 
aimed  at  improving  system  integrity  were  not  initiated  until  the  late  1960’s.  This  AGARDograph  summarizes  associated 
analysis,  design,  development  and  checkout  approaches. 

The  AGARDograph  is  organized  into  three  major  parts.  Part  1.  Background  and  Requirements,  includes  overviews 
of  the  historical  evolution  of  flight  control  systems  and  their  reliability  characteristics,  as  well  as  discussions  of  current 
and  projected  reliability  trends  for  flight-critical  control  applications.  Part  11.  Analysis  and  Testing,  deals  with  theoretical, 
simulation  and  online  techniques  for  failure  detection  and  prediction.  Principal  areas  addressed  here  are  reliability 
modeling  and  analysis  to  achieve  design  integrity  for  complex  systems,  self-diagnosis  and  fault-identification  methods, 
efficient  software  generation  and  management,  preflight  checkout  and  built-in  test  and  evaluation.  Part  111,  Design  and 
Implementation,  is  devoted  to  representative  high-integrity  system  mechanizations.  Included  are  design  approaches  for 
redundant  flight  control  sensors,  processors,  and  actuators  and  selected  implementations  fcr  fighter,  transport  and  heli- 
copter applications. 

The  participation  and  support  of  the  many  individuals  who  have  made  this  AGARDograph  possible  is  gratefully 
acknowledged.  Particular  recognition  should  be  given  to  the  extensive  efforts  of  the  authors  in  developing  the  overview 
and  application  papers  and  to  their  outstanding  cooperation  in  sharing  their  experiences  and  views.  The  Editor  also 
wishes  to  express  his  appreciation  to  Mr  Colin  Copage,  UK,  Dr  Walter  Metzdorf.  Germany,  Mr  Alain  Chadeau,  France, 
and  .Mr  Lawrence  Taylor,  USA,  who  served  as  coordinators  for  the  identification  and  selection  of  topics  and  authors 
from  their  respective  countries;  to  the  Guidance  and  Control  Panel  members  and  AGARD  staff  for  their  valuable 
assistance  and  advice;  and  to  Ms  Nancy  Owen  who  handled  much  of  the  correspondence  and  administrative  support  for 
the  AGARDograph.  The  capable  review  of  the  AGARDograph  by  Mr  H. Andrews,  USA.  contributed  significantly  to 
development  of  the  final  document. 


Peter  R.Kurzhals,  Director 
Guidance,  Control  and 

Information  Systems  Division 
NASA  Headquarters 
Washington,  D.C. 
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A HISTORICAL  FERSFECTIVE  POR  ADVANCES  IN  TLiaX!  CONTROL  STSTSMS 

Duane  McRuer 
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Dunstan  Graham 
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Professor,  Princeton  University 
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INTRODUCTION 

For  the  last  quarter  century,  the  automatic  control  of  flight  has  been  at  the  leading  ecJge  of 
aeronautical  development.  Automatic  flight  today  retains  this  position  at  the  technological  leading  edge 
while  also  exhibiting  many  features  of  maturity  — one  is  that  a great  many  people,  large  resources,  and 
much  time  are  needed  to  bring  any  new  concept  to  fruition.  That  the  fruit  is  worth  growing  can  hardly  be 
of  doubt  to  any  of  us  who  rely  on  the  modern  airliner.  This  is  routinely  guided  and  controlled  from  climb- 
out  after  takeoff  to  rollout  after  touchdown  by  automatic  devices  operating  with  unparalleled  reliability, 
extraordinary  precision,  and  remarkable  safety.  It  is  even  more  evident  in  the  military  domain,  where 
autcmatic  flight  is  more  often  than  not  an  absolute  necessity. 

Our  intent  in  this  paper  is  not  to  describe  where  we  are  in  automatic  flight,  but  rather  to  trace 
its  development,  so  as  to  set  the  stage  and  to  provide  seme  perspective  for  what  follows  in  this  monograph. 
The  evolution  is  interesting  both  as  a tale  of  science  and  technology  and  as  a case  study  of  the  rise  of  a 
major  high  technology  industry.  In  both  aspects  the  branches  of  the  story  came  to  a junction  point  shortly 
after  World  War  II.  Since  that  time  there  has  been  an  enormous  amount  of  activity  over  a very  wide  front 
which  has  been  well  documented.  We  will  touch  on  this  most  modern  era,  but  will  emphasize  the  formative 
stages  — those  years  which  lead  to  the  establishment  of  automatic  flight  control  as  a viable  and  essential 
discipline  in  aeronautics. 

There  is  another  highly  practical  reason  for  a history  of  this  kind,  as  perhaps  best  expressed  in 
the  philosopher  George  Santayana's  remark,  "Those  who  do  not  remember  the  past  are  condemned  to  repeat  it." 
It  is  in  this  vein  as  well  that  we  will  speak  about  the  history  of  autcmatic  flight,  which  has  had  its  suc- 
cesses and  also  its  failures,  its  dominant  and  subordinate  trends  sometimes  recurring,  and  trace  these 
successes  and  failures  as  a lesson  for  the  future. 

The  theme  for  our  history  will  encompass  the  early  independent  development  of  theory  and  practice, 
their  subsequent  confluence,  and  some  relation  of  the  perspective  so  gained  to  the  present  state  of  affairs. 

BEFORE  FIRST  FlBXtl  — THE  SEARCH  FOR  BIHERENT  STABUITV 

Many  of  the  fundamental  problems  of  flight  control  were  apparent  before  the  Wright  Brothers.  For 
instance,  the  necessity  for  the  control  of  the  path  of  the  airplane  was  recognized  both  practically  and 
theoretically.  An  early  example  is  the  paths  shown  in  Fig.  1,  Lanchester's  famous  phugoids;  on  a larger 
scale  the  date,  November  I897,  can  be  seen  in  the  lower  left  corner  of  this  figure. 

Lanchester  was  interested  in 
the  stability  of  the  paths  of  airplanes, 
and  he  pursued  this  interest  both  on  an 
experimental  and  theoretical  basis.  This 
was  particularly  easy  to  do  with  the 
path  of  gliders  as  they  were  launched. 

We  suspect  that  most  of  the  readers  have 
at  one  time  or  another  folded  up  a paper 
airplane  and  launched  it.  If  the  ini- 
tial speed  is  just  exactly  right,'  the 
path  is  a straight  line  ( ideally  hori- 
zontal is  drag  is  zero,  as  assumed  by 
Lanchester).  If  the  speed  is  higher, 
the  glider  will  climb  and  may  even  loop, 
whereas  if  the  launching  speed  is  too 
slow  it  will  sink,  gain  speed,  and  come 
up  again.  These  paths  are  an  expres- 
sion of  that  fact.  The  horizontal  line 
at  the  top  is  the  path  of  an  airplane 
launched  at  the  correct  speed.  The 
other  paths  going  through  approximately 
sinusoidal  flight  and  then  those  illus- 
trating looping  flight  are  the  paths  of 
airplanes  that  are  launched  with  suc- 
cessively higher  speeds. 

Lanchester  published  this  chart  in  connection  with  a patent  applied  for  in  1 897.  He  was  the 
first  to  study  the  stability  of  the  path  of  the  airplane  with  no  control.  He  was  in  the  tradition  of  the 
people  who  attempted  to  secure  what  they  called  "inherent"  stability,  that  is,  stability  properties  which 
accrued  by  virtue  of  fundamental  structure  — or,  in  modern  terms,  stability  without  feedback. 

We  now  recognize  that  inherent  stability  is  very  unlikely  to  be  achieved,  but  the  early  Inventors 
pursued  it  with  great  diligence.  Further,  we  appreciate  today  that  the  stability,  if  any,  is  largely  with 


ro.;peot  to  the  air  mass  in  which  the  airplane  flies.  The  inherent  stability  which  the  early  inveniors 
atten^ted  to  secure  by  design  of  the  aircraft  itself  is,  in  some  oases,  a major  factor  in  present  designs. 
For  example,  Penaud,  in  his  statement  of  the  requirements  for  a horizontal  tail,  its  angle  of  incidence, 
the  location  of  the  wing  and  tail  with  respect  to  the  o.g.,  etc.,  is  still  followed  in  principle  to  achieve 
longitudinal  stability.  On  the  other  hand,  Lilienthal  emphasized  inherent  lateral  stability.  He  achieved 
it  in  several  of  his  gliders,  with  the  ultimate  result  that  it  killed  him.  The  large  effective  dihedral  of 
his  hang  glider  created  an  inherent  stability  not  with  respect  to  the  earth  but  with  respect  to  the  air 
mass,  and  on  his  last  flight  his  control  power  was  insufficient  to  offset  a gust  disturbance. 

While  the  Wright  Brothers  are  Justly  famed  for  their  priority  in  mar.y  fields  of  aviation,  their 
most  notable  contribution  was  the  implicit  appreciation  that  the  secret  to  the  control  of  flight  was  feed- 
back. From  their  tethered  and  glider  experiments  they  recognized  that  the  human  pilot,  operating  on  feed- 
back signals,  that  is,  his  attitude  with  respect  to  the  ground,  his  position  with  respect  to  a desired 
landing  point,  etc.,  should  be  able  to  operate  the  controls  so  as  to  stabilize,  control,  and  gride  the  air- 
craft in  a desirable  fashion.  They  recognized  that  the  frustrating  search  for  inherent  stability  might 
well  be  abandoned  if  only  the  pilot  were  provided  with  sufficiently  powerful  controls  with  which  to  balance 
and  steer  — in  other  words,  that  the  human  pilot,  operating  on  feedback  signals,  could  use  the  controls  to 
stabilize  a neutrally  stable  or  an  inherently  unstable  aircraft.  The  Wrights  proceeded  to  build  this  air- 
craft configured  for  good  control  and  were  ultimately able  to  demonstrate  its  compliance  with  the  first 
flying  quality  specification  — a single  sentence  from  Signal  Corps  Specification  U8C,  "During  this  trial 
flight  of  one  hour  it  must  be  steered  in  all  directions  without  difficulty  and  at  all  times  under  perfect 
control  and  equilibrium."  If  only  we  could  achieve  such  concise  and  relevant,  yet  subtle,  la:.guage  in  our 
modem  flying  quality  specifications; 

So,  by  virtue  of  the  Wright  Brothers,  we  have  feedback  control  and  control-configured  vehicles  as 
fundamental  features  in  aircraft.  Let  us  turn  now  to  the  main  theme  where  the  feedback  control  is  accom- 
plished in  an  automatic  way.  To  do  this,  the  story  will  be  divided  into  throe  time  periods. 

the  first  PERIOD;  1890-1934 

The  first  period  ranges  from  about  I89O  to  1934.  During  this  time,  two  t^pes  of  engineers  were 
active  — the  dynamics  theoreticians  and  the  tinkerer/inventors.  They  were  both  interested  in  path  control, 
but  their  approaches  and  views  kept  them  in  essentially  complete  isolation  from  each  other.  The  tinkeror 
inventors  were  interested  in  s-'  abilizing  the  path  with  control  apparatus  which  served  to  control  the  air- 
craft so  as  to  duplicate  gyroscopic  references.  They  were  not  particularly  concerned  with  the  actual  dyna- 
mics of  the  aircraft  except  as  these  might  incidentally  interfere  with  this  process.  Consequently,  they 
understood  little  about  the  aircraft  dynamics  except  as  a source  of  lags,  and  these  lags  were  often  second 
order  to  those  of  the  primitive  actuation  equipment.  For  this  reason,  control  at  best  was  exerted  on  the 
kinematic  and  long-term  motions  of  the  aircraft  and  the  aircraft  dynamics  had  relatively'  small  effect. 

On  the  other  hand,  the  aircraft  stability  and  control  dynamiclsts,  the  theorists,  emphasized 
empirical  and  mathematical  treatments  of  the  dynamics  of  the  uncontrolled  aircraft.  In  spite  of  the  prac- 
tical demonstration  of  the  Wright  Brothers,  they  were  after  inherent  stability  and  they  studied  the  subject 
for  that  purpose.  The  most  perceptive  of  the  theorists  believed  fran  the  start  that  the  "controlled  air- 
craft" was  the  important  thing,  but  that  pilot  actions  were  too  complex  and  variable  to  ar.alyze.  "Auto- 
matic stability"  depending  on  the  use  of  gyros,  pendulums,  or  other  movable  parts  to  create  airplane 
stability  was  not  well  thought  of.  For  example,  the  great  pioneer  of  aircraft  stability  and  control, 

G.  H.  Bryan,  said  in  1910,  while  .peaking  of  the  stability  of  aircraft  (Ref.  1): 


"Apart  from  the  fact  that  movable  parts  are  liable  to  get  out  of  order,  it  must  be 
remembered  that  they  increase  the  number  of  degrees  of  freedom  of  the  machine,  thus 
further  adding  to  the  number  of  conditions  which  have  to  be  satisfied  for  stability  — 
a number  quite  large  enough  already.  I anticipate  that  the  successful  aeroplane  of 
the  future  will  possess  inherent,  not  "automatic"  stability,  movable  parts  being  used 
only  for  purpo.ses  of  steering. " 


The  first  among  the  tinkerer/inventors  was  Sir  Hiram  Maxim,  an  expatriate  Yankee,  whose  life 
provided  one  of  Don  Ameche's  many  biographical  movie  roles,  .’■laxi.m.  was  a prodigious  inventor,  most  famous 
perhaps,  and  certainly  rich  because  of,  the  M.axim  machine  gun.  In  he  began  the  construction  of  a 

giant  flying  machine  (Fig.  2).  Photo- 
graphs am-  available,  but  they  don't 


convey  the  impression  that  thi."  draw- 
it.g  does  of  this  fabulous  and  colos- 
sal mjichir.e.  It  was  110  ft  long  and 
weighed  3-1/2  ton.:.  It  wa.  powered 
with  a .;te6im  engin".  Maxim  config- 
ured hi.-  '.■xperimcnts.  in  such  a way 
that  the  machine  had  about  • inch’ 
between  its  nppor*  '.r.  a r-iil  and  a 
"teth'-r  r'lll"  which  W'l  1 ;lgn' d *-■ 
r'  Strai’  it  from  r'aJly  f!ylr,g.  In 
or.e  ■ r hi.',  '.■■ry  -‘arly  ' rial.  Mi' 
.machine  act'ia.!iy  lif*.-'d  ■ • cwgh  •. 
break  th'  re  '.rait.ing  tc'.h'T  rals 
one  .;lie,  wiv  r-cipo:  i*  nil 't  ■■  r 

and  era  hed.  Miixim,  who  by  hi  w 
account  int-'nded  to  • ■ .m  h!  ; u. 
in  acror.autlc;  with  di.  "u.  i : ! 

lift  ar.d  pfjwr,  though*-  tha*  'Iw 
jK  int  had  bee;,  pr  .'ed  an  ! abanir  ■ I 


Upper  wing  rigged  to  fairly 
shorp  cothedrol  ongle  01  tips 


Mam  wing  set  at  neutrol  angle 
(flot)  loterolly  (All  three  wings 
sef  ot  consideroble  angle  of 
incidence ) 

t Elevotor 


Fixed  sfobilirer  se* 
of  slight  negative 
angle  of  incidence 


Lower  wmgs 
set  ot  slight 
dihedrol  ongle 

Steom  Chest 

Tether  Rod 


iin  Propellers 


his  aeror.autical  pxp'Tiiivnt.  af*  - r 
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Now,  why  do  we  show  you  this?  It  is  because  Maxim,  the  tinXerer/inventor,  had  installed  in  this  giant  fly- 
ing machine  what  we  would  now  call  a stability  and  control  augnentation  system  of  an  incredibly  advanced 
design  for  the  time.  Figure  3 is  a drawing,  from  Ref.  2,  of  this  apparatus.  In  his  book.  Artificial  and 
Natural  Flight  (Ref,  3)>  Maxim  described  his  pendulous  gyroscope  as  steam  driven  "after  the  fashion  of  a 
Barker's  mill."  A Barker's  mill  is  a mechanical  contrivance  invented  by  a Dr.  Barker  about  the  end  of  the 
seventeenth  century.  It  consists  of  a hollow  cylinder  provided  with  a number  of  horizontal  arms  fitted 
with  lateral  apertures,  mounted  so 
as  to  rotate  about  the  cylinder's 
axis  of  symmetry.  In  Maxim's  case, 
steam  provided  the  motive  force. 

This  was  Maxim's  gyroscope  drive 
principle.  The  gyroscope  was  con- 
nected to  a valve,  and  the  valve 
body  is  repositioned  by  the  output 
of  a servo.  It  is  interesting  to 
note  that  the  output  could  also  be 
positioned  by  an  "engineer"  with 
his  pitch  conmand  lever.  So,  we 
have  attitude  hold  and  provision 
for  pilot  intervention  with  a high- 
performance  servo  system  and  a 
gravity-erected  gyroscope.  The 
date  — approximately  l894.  The 
airplane  did  not  fly,  but  if  it 
had,  the  autopilot  might  well  have 
worked. 


Somewhat  later  but  still  very  early,  Elmer  Sperry  began  experimenting  with  gyroscope  stabiliza- 
tion of  the  airplane.  After  some  abortive  first  efforts  in  1909-IO,  in  the  period  from  19IO  to  1914  he  and 
his  son  Lawrence,  who  was  keenly  Interested  in  aeronautics,  developed  what  they  csiUed  a "stabilizer"  and 
which  was  in  fact  an  autopilot. 


Most  readers  of  this  paper  have  probably  seen  the  photograph,  which  we  will  not  show,  of  the 
demonstration  of  the  Sperry  stabilizer  in  1914  winning  a safety  prize  donated  by  the  Aero  Club  of  France. 

The  way  in  which  they  demonstrated  the  pitch  and  roll  control  was  for  Lawrence  Sperry  to  fly  at  low  alti- 
tude over  the  Seine  standing  up  in  the  cockpit  of  his  Curtiss  Flying  Boat  holding  his  hands  over  his  head 
while  his  mechanic  walked  out  on  the  wings.  In  this  way  they  exhibited  the  capabilities  of  the  control 
system  and,  simultaneously  and  incidentally,  a most  touching  faith  in  the  reliability  of  the  equipment; 
Figure  4 illustrates  the  equipment  out  of  the  Flying  Boat,  The  heart  of  the  system  was  a gyro  platform  — 
four  gyroscopes  on  a pendulously  suspended  gimballed  platform.  The  gyros  maintained  the  platform  alignment. 
Their  spin  axes  were  all  horizontal,  with  one  pair  aligned  longitudinally  and  the  other  pair  transversely. 
The  gyro  pairs  had  oppositely  spinning  wheels  and  were  coupled  for  equal  and  opposite  precession.  Roller 
contacts  on  the  stabilized  platform  measuring  the  bank  and  pitch  angles  actuated  solenoid  clutches  in  pneu- 
matic servos  connected  to  the  ailerons  and  elevator;  motion  of  the  surfaces  repositioned  the  contactor  ele- 
ments. An  engine-driven  alternator  provided  gyro  power  and  the  servos  were  powered  by  compressed  air  from 
an  engine  cylinder.  It  worked  very  well,  but  noisily. 


Other  inventors  were  very  busy.  Starting  about  the  time  that  flying  came  to  Europe,  people 
tried  or  conceived  of  all  kinds  of  automatic  stabilization  for  an  aircraft.  They  used  the  feedback  of 
speed,  of  incidence  (what  we  now  call  angle  of  attack),  of  inclination  (what  we  now  call  pitch  angle),  of 
its  derivative,  etc.,  and  they  atten^jted  power  amplification  and  servo  mechanism  drives  of  the  control 
surfaces.  Figure  5,  which  is  not  complete,  is  in  fact  taken  from  a 1936  NACA  TM  (Ref.  4).  The  chart  is 
simply  an  abstraction  of  a vast  amount  of  work.  Perhaps  a sad  part  of  all  this  vast  exjierimentation  on 
feedback  control  of  aircraft  was  that  nobody  had  any  use  for  it.  The  designers  of  aircraft  had  learned 
how  to  provide  enough  stability  so  that  the  pilots  could  handle  the  airplanes  and  nobody  needed  feedback 
control. 


In  1932  a three-axis  Sperry  autopilot,  very  similar  to  the  original  1912-14  system  with  the 
addition  of  a rudder  axis,  was  bought  by  Eastern  Airlines  for  their  Curtiss  Condors.  The  Condor  was  the 
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last  of  the  biplane  transjwrts  in  service  in  the  United  States,  and  its  autopilot  was  the  last  of  the 
primitive  devices.  By  1933  the  Sperry  Gyroscope  Co.  had  develt^d  their  A-2  autopilot.  The  A-2  was  much 
more  modem  in  concept  and  execution.  It  was  essentially  proportional  attitude  control.  It  had  pneu- 
matic pickoffs  on  the  gyros  and  hydraulic  servos.  Its  first  practical  application  was  to  the  around- the- 
world  flight  of  Wiley  Post  between  the  15th  and  22nd  of  July  in  1935,  when  he  flew  his  Lockheed  Vega  5B, 
the  "Winnie  Mae,"  around  the  world  in  a total  flying  time  of  I15  hours,  j6-l/2  minutes.  Post  gave  the 
A-2  autopilot,  "Mechanical  Mike,"  credit  for  helping  the  success  of  this  incredible  flight.  He  was  able 
to  nap  in  flight  while  the  aircraft  was  under  automatic  flight,  again  a most  touching  faith  in  the  relia- 
bility of  the  autopilot;  The  Hew  York  Times  called  the  flight  "a  revelation  of  the  new  art  of  flying." 

The  news  report  added; 

"By  winning  a victory  with  the  use  of  gyrostats. . .Post  definitely  ushers  in  a new  stage 
of  long-distance  aviation.  The  days  when  human  skill  alone,  an  almost  birdlike  sense 
of  direction,  enabled  a flyer  to  hold  his  course  for  long  hours  through  a starless  night 
or  over  a fog  are  over.  Commercial  flying  in  the  future  will  be  automatic." 

By  193^  this  same  autopilot,  the  Sperry  A-2,  was  ordered  and  installed  in  what  is  now  United 
Airlines'  Boeing  2kjs,  Just  as  the  Condor  was  the  last  of  the  biplane  fabric-wing  transports,  the  Boeing 
2I7  was  the  first  of  the  all-metal  monoplanes.  Thus,  193**  provides  an  appropriate  end  for  the  first  era 
of  automatic  control  of  aircraft  from  the  standpoint  of  the  tinkerer/inventor . 

In  the  meantime,  the  theoreticians  had  not  been  idle.  Lanchester  and  his  study  of  the  phugoid 
motions  has  already  been  mentioned.  Not  very  long  after  Lanchester's  investigations,  in  fact  in  1903, 
the  year  of  the  first  flight,  we  have  the  first  major  contribution  of  Bryan.  For  starters,  he  studied  the 
linearized  motions  of  the  airplane,  assigning  small  perturbations;  discovered  the  separation  of  the  longi- 
tudinal and  lateral  motions;  invented  stability  derivatives,  etc.  Only  the  orientation  of  his  axis  system 
differed  from  modem  usage;  Shortly  after  that,  Bairstow  and  Melvill  Jones,  at  the  National  Physical 
Laboratory  in  Great  Britain,  measured  the  stability  derivatives  and  calculated  the  motions  of  practical 
airplanes  (e.g..  Ref.  5)-  In  the  period  from  about  19IO  through  the  early  30's  there  was  an  enormously 
productive  effort  in  Great  Britain.  People  calculated  the  stability  of  aircraft,  calculated  the  response 
to  disturbances,  calculated  the  response  to  applications  of  controls,  made  full-scale  in-flight  measure- 
ments to  show  that  the  responses  were  correct,  etc. 

Perhaps  most  notable  from  the  automatic  control  standpoint  during  this  period  are  the  efforts 
of  Gates  and  Gamer.  Gates,  in  192k  (Ref.  6),  assumed  that  the  controls  were  moved  according  to  certain 
"laws,"  that  is,  in  proportion  to  certain  output  variables  and  their  derivatives.  He  stressed  that  good 
stability  was  not  enough,  that  it  was  essential  also  to  consider  the  amplitudes  of  the  several  modes  of 
motion.  In  1926  Garner  (Ref.  7)  made  an  analysis  of  the  lateral-directional  motions  of  an  airplane  under 
the  influence  of  feedback  control.  He  'specifically  pointed  out  that  the  movements  of  the  controls  might 
be  regarded  as  made  either  by  the  human  pilot  or  by  some  mechanical  means.  Garner  further  had  the  wit  and 
vision  to  make  provision  in  the  theoretical  treatment  for  lag  in  the  application  of  controls  and  was  able 
to  point  to  a qualitative  correspondence  between  his  analytical  results  and  flight  tests  of  an  RAE  auto- 
matic mdder  control  which  had  an  appreciable  lag. 

It  now  seems  surprising  that  these  papers  are  not  given  more  prominence  in  accounts  of  the 
uevelopment  of  the  theory  of  automatic  control  systems.  They  seem,  in  fact,  to  have  fallen  in  a deep  dark 
hole.  Perhaps  they  were  simply  too  far  ahead  of  their  time;  perhaps,  on  the  other  hand,  it  ’was  only  in 
Great  Britain  where  automatic  flight  control  system  development  at  this  tine  was  the  responsibility  of  a 
government  research  establishment  that  it  was  thought  to  be  desirable  to  make  response  calculations  in 
connection  with  the  design  of  "practical"  systems.  It  cannot  be  said  that  the  people  who  were  developing 
autopilots  paid  no  attention  to  the  theoreticians;  they  were  sitting  across  the  hall  from  one  another  and 
they  did  know  what  the  theoreticians  were  doing.  For  example,  as  early  as  193’'  In  the  dawn  of  the  second 
era  we  have  the  paper  by  Meredith  and  Cooke  (Ref.  8).  They  crossed  the  lines  by  describing  both  the 
practical  and  theoretical  aspects  of  autopilot  development. 

By  1935  whsn  B.  Melvill  Jones  surveyed  stability  and  control  in  "Hynamics  of  the  Airplane," 
Section  N of  Vol.  V of  Durand's  Aerodynamic  Theory'  (Ref.  9),  the  classical  approach  initiated  by  Brj'an  was 
well-established  but  very  little  used.  The  theory  of  small  perturbations,  the  examination  of  stability, 
the  ability  to  calculate  the  time  history  in  response  to  disturbance  or  to  the  application  of  control,  the 
full-scale  experiments  that  led  to  the  conviction  that  the  theory  of  infinitesimal  motions  wa ' practical 
for  the  prediction  of  stability  of  motion,  etc.,  were  all  meticulously  and  elegantly  covered.  The  effects 
of  variations  in  the  configuration  of  a typical  airplane  wore  traced  via  their  influence  on  the  derivatives 
to  the  result  in  terras  of  stability  in  motion.  Furthermore,  these  results  were  appreciated  not  only  in 
terms  of  the  solutions  to  specific  numerical  examples,  but  more  generally  as  approximate  solutions  gi-ven 
in  toms  of  the  dominant  literal  stability  derivatives.  But,  Melvill  Jones  did  not  cover  feedback  control 
of  the  aircraft's  motions  although  he  wrote  a decade  after  Gates'  initial  efforts.  He  recognized  that: 

"It  is  probable  that  mechanical  control  will  become  increasingly  popular  for  large 
long-distance  aeroplanes  and  for  anything  ir.  the  nature  of  pioneer  work  in  this 
subject  calculations  of  this  kind  are  essential.  No  mention  of  the  methods  of 
extending  the  calculations  to  deal  with  mechanical  control  will,  however,  be  found 
in  the  present  work  since  this  is  still  a matter  of  research  and  what  little  has 
been  published  is  mainly  of  a controversial  nature." 

He  did  recognize  that  "work  of  the  t;,p’.‘  discussed  here  forrr.s.  an  essos.tial  introductio-.  to  the  studj'  of 
mechanical  control."  Melvill  Jones'  comment  on  the  application  of  the  theor;.'  which  he  did  cover,  i.e., 
aircraft-alone  dynamics,  was; 

"In  spite... of  the  completeness  of  the  experimental  and  theoretical  structure... 
it  is  undoubtedly  true  that,  at  the  time  of  writing,  calculations  of  this  kind  are 
very  little  used  by  any  but  a few  research  workers.  It  is  in  fact  rare  for  anyone 
actually  engaged  upon  the  design  and  construction  of  aeroplanes  to  make  direct  use 


of  [such]  ccmputations . . . , or  even  to  be  familiar  vrith  the  methods  by  which  they 
are  made.... In  my  own  opinion  it  is  the  difficulty  of  computation. . .which  has  pre- 
vented designers  of  aeroplanes  frcsn  making  use  of  the  methods...." 
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We  shall  refer  again  to  this  quotation.  But  it  does,  by  extension,  make  matters  cl.-ar  about  automatic 
flight  as  well.  Since  the  procedures  then  available  for  treating  automatic  systems  involved  factoring 
quintics  or  higher  degree  polynomials,  whereas  the  aircraft-alone  equations  were  only  quarries,  it  is  easy 
to  see  why  very  few  people  were  interested  in  pursuing  design  calculations  in  any  depth. 

THE  SECCMD  PERIOD;  1934-194? 

Let  us  turn  now  to  the  second  era  which  persisted  from  1954  to  194?.  The  approaching  Second 
Great  War  dramatically  influenced  the  development  of  automatic  pilots  and  encouraged  the  further  elabora- 
tion of  the  theory,  but  they  still  remained  largely  separate  lines  of  endeavor.  What  happened,  in  the  Uni- 
ted States  anyway,  was  the  very  rapid  development  of  what  was  then  ceilled  the  all-electric  automatic  pilot. 
Recall  that  the  Sperry  1914  autopilot  was  electric  in  its  sensors  and  pickoffs  but  not  in  its  actuation. 
Subsequently,  the  Sperry  Co.  went  to  pneumatic  pickoffs,  pneumatic  power  for  the  gyroscopes  themselves, 
and  hydraulic  actuation.  The  all-electric  autopilots  were  in  fact  all-electric  in  the  sensors,  pickoffs, 
power  amplification,  and  act\iation  (Ref.  10).  The  flexibility  associated  with  this  means  of  mechaniza- 
tion permitted  rapid  introduction  of  a number  of  novel  features  — a single-knob  turn  control  (replacing 
three  different  knobs),  erection  cutout,  altitude  and  heading  as  outer  locps  superimposed  around  the  pre- 
vious pitch  and  bank  loops,  synchronizers,  rate  gyros  or  electrical  compensation  to  increase  damping  — 
all  appeared  in  practical  production  flight  hardware  within  a very  short  time.  Again,  almost  all  of  this 
was  accoiqplished  by  the  tinkerer/inventors  operating  with  little  or  no  theoretical  backup.  Like  aircraft 
themselves,  the  stability  and  control  properties  of  the  closed-loop  systems  were  evaluated  in  flight  tests, 
and  flight  control  equipment  was  also  designed  with  the  aid  of  extensive  full-scale  testing.  The  "curse 
of  dimensionality"  (with  apologies  to  Richard  Bellman)  mentioned  by  Melvill  Jones  was  still  present,  and 
cut-and-try  did  the  Job  — indeed,  so  well  that  all  the  elements  of  a modern  automatic  pilot  were  now  at 
hand. 


The  triumph  of  the  tinkerer/inventors  came  in  194?.  On  the  second  column  of  the  page  from  the 
Hew  York  Times  for  September  23,  194?,  shown  in  Fig.  6 is  an  article  which  describes  the  flight  of  the 
U.  3.  Air  Force's  All-Weather  Flying 


Division's  C-54,  "Robert  E.  Lee."  This 
aircraft  had  a Sperry  A- 12  autopilot 
with  apg)roach  coupler  and  a Bendix  auto- 
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matic  throttle  control.  These  were  more 
or  less  state  of  the  art  at  the  time. 

It  also  had  some  fairly  special-purpose 
IBM  equipment  which  permitted  cocmands 
to  its  automatic  control  to  be  stored  on 
pninched  cards  fed  autcmatically.  From 
the  time  that  the  brakes  were  released 
for  takeoff  from  Stephenville , New- 
foundland, until  the  larding  roll  was 
ccrapleted  at  Brize-Norton  in  England 
the  next  day,  no  human  hand  touched 
the  control.  The  selection  of  course, 
radio  station,  speed,  flap  setting, 
landing  gear  position,  and  the  final 
application  of  wheel  brakes  were  all 
accco5>lished  autcmatically  from  a pro- 
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gram  stored  on  pninched  cards . The  com- 
plete automation  of  aircraft  flight 

appeared  again  to  be  at  hand.  Figure  6. 


The  second  era  also  saw  the  very  rapid  development  of  theory  with  which  we  are  familiar  today. 
Servo  analysis  techniques  as  they  derived  frem  feedback  amplifier  design  were  introduced  first  to  servo- 
mechanisms and  later  to  aircraft.  The  key  contribution-',  of  Nyquist  (Ref.  11),  Bode  (Ref.  12),  Nichols 
and  Phillips  (Ref.  13),  Harris  (Ref.  14),  Hall  (Ref.  I5),  the  stability  diagrams  (now  called  piarameter 
spaces),  Evans'  root  locus  (Ref.  16),  time  vectors  (Refs.  17-181,  etc.,  were  all  developied  during  this 
era.  Although  they  were  scarcely  ever  applied  to  automatic  flight  control  system  design,  the  techniques 
were  there  waiting  in  the  wings  — theories  in  search  of  problems. 


THE  M3DERN  PERIOD:  194?- 

The  problems  were  not  long  in  coming.  The  war  had  seen  the  advent,  on  both  sides,  of  the  turbo- 
jet engine,  and  suddenly  the  limits  of  the  flight  envelope  were  enormously  extended  in  both  speed  and  alti- 
tude, with  concomitant  configuration  changes  involving  increased  wing  loadings,  mass  distributions  concen- 
trated in  long  thin  fuselages,  the  aerodynamic  benefits  of  short  span,  swept  wings,  etc.  All  sorts  of  new 
problems  arose  that  were  of  interest  both  to  the  aircraft  designer  and  to  his  new  fixit  mar.,  the  flight 
control  designer.  New  phenomena  were  even  discovered  — fuel  slosh,  rolling  instability,  structural  insta- 
bilities influenced  by  automatic  control,  etc.  Power-boosted  controls  came  into  use  to  handle  the  large 
hinge  moments  of  the  control  surfaces,  and  these  actuators  had  stability  difficulties  of  their  own.  All 
of  these  trends  were  bad  news  for  the  automatic  flight  control  system  designer,  who  now  desperately  needed 
and  wanted  analytical  help.  People  suddenly  seemed  to  realize  that  pooling  the  knowledge  of  dynamic  sta- 
bility with  the  knowledge  of  instrument  design  was  essential  for  the  betterment  of  aeronautics  if  this  was 
to  be  accomplished  in  an  expeditious  way  without  expenditure  of  an  excessive  number  of  experimental  flight 
hours  each  fraught  with  extraordinary  adventures  for  test  pilots;  So,  while  the  Intimate  joining  of  con- 
trol technology  and  vehicle  dynamic  analysis  would  no  doubt  have  come  about  in  any  event,  it  was  forced 
by  the  marked  deficiencies  in  stability  of  the  new  jet  aircraft  and  by  the  advent  of  the  guided  missile. 
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where  it  was  obviously  essential  to  match  the  dynamics  of  the  airframe  and  the  control  system  from  the 
first  flight  on.  This  is  the  confluence  of  theory  and  practice.  One  of  us  likes  to  date  this  as  19I7  to 
191*8  and  associate  it,  admittedly  on  a personal  basis,  with  a remarkable  airplane  now  little  remembered. 

Figure  7 shows  the  B-l*9,  which  in  191*8  was  to  be  the  production  bomber  for  the  United  States  Air 
Force.  It  was  the  last  and  most  successful  of  John  Northrop 's  great  series  of  all-wirig  aircraft.  In  our 
modern  jargon,  it  was  a control-configured  vehicle,  and  its  great  success  as  a flying  machine  was  peculiarly 
dependent  upon  many  flight  control  system  developments. 

Its  control  surfaces  were  powered  by  the  first  success- 
ful fuUj'-powered  hydraulic  actuators  developed  in  the 
United  States  on  its  prop)eller-driven  version,  the 
XB-35,  first  flown  in  191*6.  These  were  essential 
because  of  anticipated  (and  actual)  unstable  hinge 
moment  gradients  due  to  increasing  separation  over  the 
trailing  edge  as  stall  was  approached.  The  airplane 
was  also'  equipiped  with  a series-installed  yaw  dar.p’  r/ 
rudder  trim  system  in  a quasi-fly-by-wire  configura- 
tion  which  was,  as  far  as  we  know,  the  first  success- 
ful stabiiity'au^entor'flown  in-  tii.i  U.nited  States. 

[K.  H.  Doetsch  had  earlier  applied  a bang-bang  yaw 
damper  to  a rudder  tab  on  the  Henschel  Hs  129  (Ref.  2)]. 

In  fact,  the  very  name  "stability  augnentor"  stems 
from  this  aircraft.  It  was  originally  "stability 
derivative  augaentor,"  but  the  middle  word  was  deleted 
to  more  readily  fit  the  title  block  of  an  installation 
drawing;  Besides  the  obvious  configuration  aspects  to 
maximize  performance  while  attending  to  the  consequent 
control  problems  via  automatic  control,  considerable 
thought  was  given  to  further  improvement  of  the  land- 
ing and  cruise  performance  by  flying  the  aircraft  with 
an  unstable  c.g.  location.  Analytical  and  experimental 
studies,  including  a flight  demonstration,  of  stabili- 
zation of  a 10  percent  unstable  aircraft  with  automatic 
control  were  undertaken  and  seriously  considered  for 
aj5>lication.  This  was  not  adopted  because  the  aircraft 
met  requirements  readily  without  the  additional  auto- 
matic system  complexity.  But  the  important  thing  for 
our  story  is  that  this  is  one  of  the  first,  if  not  the 
first,  examples  of  the  marriage  of  the  science  of  the 
theoretician  with  the  art  of  the  tinkerer/inventor . 

Frankly,  our  rc.-Sllection  is  that  no  one  thought  much 
about  it  at  the  time.  The  problems  were  foremost 
and  their  solution  required  the  wedding  even  if  a shotgun  were  needed.  So,  in  short  order,  there  was 
invented,  or  re-invented,  in  aircraft  plants  and  autopilot  companies  all  over  the  world,  the  yaw  damper, 
pitch  damper,  roll  damper,  sideslip  stability  augnentor,  transonic  trim  shifter,  autothrottle,  and  other 
devices.  These  were  applied  with  close  connections  between  theory-  and  practice  to  the  alleviation  of  the 
new  dynamical  effects. 

Unfortunately,  in  spite  of  the  flight  of  the  "Robert  E.  Lee"  and  the  2-1/2  decades  since  that 
time  wherein  theory  and  practice  have  been  well  connected,  all  of  our  problems  in  autaaatic  flight  are  yet 
to  be  solved.  The  rea.son  we  continue  to  have  problems  and  the  reason  why  some  of  us  as  engineers  are  .rtill 
en5)loyed  in  this  business  is  that  our  hardware/software  capabilities  h;ive  expanded  enormously  and  that  the 
requirements  are  changeable  and  multifaceted  and,  often,  somewhat  subtle  to  appreciate.  While  the  theore- 
tical structure  is  well  developed  and  practically  applied  in  desig;,  the  actual  selection  of  a design  for 
a particular  aircraft  depends  on  a very  large  number  of  things  which  do  not  readily  lend  themselves  to 
inclusion  in,  for  example,  a cost  functional.  The  proper  specification  and  satisfaction  of  all  those 
desirable  characteristics  in  the  dawning  new  fourth  era  of  automatic  flight  control  will  be  central,  for 
in  this  era  the  automatic  control  will  be  necessary  for  the  successful  performance  of  some  aircraft  in  a 
majority,  if  not  all,  of  the  flight  regimes.  We  are  faced  with  major  new  challenges  in  which  full-time, 
total-flight-envelope,  flight  control  premises  new  dimensions  of  both  aircraft  and  total  system  perfor- 
mance. The  shibboleths  of  the  new  flight  control  technology  are  words  like  multimode,  full- flight  er.velop',  . 
decoupled  (roll/yaw,  speed/flight  i>ath,  rotation/translation),  direct  lift  and  direct  side  force,  redun- 
dancy, graceful  degradation,  and  other  good  words  adopted  by  the  autopilot  salesman  to  describe  th-  virtues 
of  his  products.  To  satisfy  the  interacting  requirements  and  make  good  on  the  descriptive  phrases  requires 
the  same  kind  of  engineer  for  the  fourth  era  as  was  developed  and  operated  in  the  third.  The  details  of 
the  hardware  and  software  for  highly  redundant  and  complex  equipment  at  the  fringe  of  the  state  of  a parti- 
cular hardware  art  can  never  be  permitted  to  get  too  far  from  the  cemprehension  of  ar.  analj'st  charged  with 
overall  system  cognizance.  At  the  same  time,  the  vision  of  flight  control  theoreticians  should  never  become 
so  opaque  as  to  provide  results  of  theoretical  interest  only.  The  dangers  of  a new  separation  betweer. 
theory  and  practice  are,  we  believe,  increasing.  For  example,  a.-  Mclvill  Jones  noted,  two  generations  ago 
the  intellectual  mathematical  equipment  of  skilled  stability  and  flight  control  system  analysts  generally 
exceeded  their  physical  ability  to  perform  the  calculations  which  might  be  needed  or  desired.  Nowaday.', 
quite  the  opposite  situation  exists,  because  advances  in  both  analog  and  digital  computation  allow  the 
consideration  of  problems  which  at  one  time  would  have  been  rejected  as  being  too  time  consuming.  Ar,  a 
consequence,  the  analysts'  physical  means  now  often  exceed  his  mental  grasp,  and  what  he  ean  cempu'v  may 
far  exceed  his  understanding  or  approeiation.  This  can  lead  to  an  excessivelj'  empirical  approach  to  desig'. 
vjhich  is  . imilar  to  th"  one  used  by  the  tinkorer-  thirty  or  more  y"ar'  ago.  R’.f  a key  di  ff'  r-  t'ct  xi.e.  is 
ab.;tr-ictio:.s  In'/ol'/ed.  R'  gardles.^  of  the  detail  and  crmplexity  of  our  mati.i  mat icnl  models,  ‘hey  r.-mtii:  , 1 

that,  wherea.'  the  physical  equipment  and  the  aircraft  which  ar.'  the  ob.t'Ct'  of  our  abs' ractio:  w r-  ' 's 

tinker'  r'.'  model  . yiewed  in  the.se  '.erm-'i,  too  great  a r-linnce  on  a n-micrical  rrqiiricr..]  ,.ppr.'>ach  ' d-  '.a'. 

1 no  better,  and  m.ay  be  evn  wor  e,  than  th-  phy  leal  empirici.sm  of  'earli'  r day.-..  inur.da'"d  by  -om- 

puter  printouts  .and  trip  chart  recording.'  wo  are  confronted  with  a crucial  probl-m  — what  is  th-  e sence" 


Figure  7.  The  B-l*9,  Northrop  Flying  V.'ing 


what  does  it  all  mean'’  And  even  when  this  is  unraveled,  paper  studies  are  obviously  only  as  good  as  the 
implicit  underlying  assumptions.  No  matter  how  prescient  the  engineer  may  be  in  analytical  forecast  of 
system  normal  and  abnormal  behavior,  one  invariably  finds  a reservoir  of  residual  problems  when  the  appa- 
ratus is  built.  Thus,  in  the  fourth  era  of  flight  control,  it  is  essential  that  we  keep  the  tinkerer/ 
inventor  and  the  theoretician  communicating  so  that  the  science  of  automatic  flight  control  retains  its 
maturity  without  progressing  to  senility. 
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CHRONOLOGICAL  OVERVIEW  OF  PAST  AVIONIC  FLIGHT  CONTROL 
SYSTEM  RELIABILITY  IN  MILITARY  AND  COMMERCIAL  OPERATIONS 

S.  S.  Osder 
Sperry  Flight  Systems 
Phoenix,  Arizona  85036 
USA 


Two  decades  of  flight  control  system  mechanization  advances  are  traced  from  the  perspective  of  re- 
liability. Despite  dramatic  advances  in  device  technology  and  miniaturization  the  demand  for  more  func- 
tions tended  to  exceed  the  progress  in  electronics.  By  the  latter  1960*s,  complexity  growth  related  to 
system  monitoring  and  redundancy  management  reached  limitations  of  analog  technology  and  set  the  stage 
for  introduction  of  digital  flight  control  systems. 


1.0  INTRODUCTION 

The  rapid  advances  in  solid  state  electronic  technology  have  made  a very  significant  impact  in 
avionics  equipment.  In  many  Instances,  the  new  electronics  have  provided  the  potential  for  a weight 
improvement  of  nearly  two  orders  of  magnitude  over  the  state  of  the  art  in  the  1950’ s.  Automatic  flight 
control  systems  have  exploited  this  electronic  progress  to  the  greatest  extent  possible,  and  yet  the 
newest  systems  are  not  significantly  reduced  in  size  over  their  predecessors  of  a decade  ago.  Regard- 
less of  the  improvements  in  electronic  miniaturization,  they  are  always  outstripped  by  demands  for  more 
functions  and  additional  automation.  We  can  indeed  build  the  automatic  flight  control  system  of  the 
1950's  in  a fraction  of  its  original  size  and  with  major  improvements  in  reliability.  However,  there 
are  no  customers  for  this  type  of  progress.  The  demand  is  for  new  functions  that  require  automatic 
flight  controls  to  play  a far  more  Important  role  in  normal  flight  missions. 

The  technology  transitioned  from  vacuum  tube  circuits  to  the  first  transistorized  mimics  of  these 
tube  circuits  and  finally  to  the  large  scale  integrated  devices  that  dominate  the  designs  of  the  mld- 
1970's.  The  applications  grew  from  the  simple  pilot  relief  autopilot  to  the  redundant,  flight -critical 
guidance  and  control  systems  that  provided  such  functions  as  automatic  landing,  stabilization  of  margin- 
ally controllable  air  frames  and  fly-by-wlre.  The  intent  of  this  paper  is  to  provide  a perspective  of 
how  we  evolved  to  the  present  state  of  the  art  in  automatic  flight  control  with  emphasis  on  how  the  pur- 
suit of  reliability  always  challenged  and  motivated  the  technology  advances. 

2.0  REVIEW  OF  ELECTRONIC  PROGRESS  IN  AUTOMATIC  FLIGHT  CONTROLS 

Starting  the  review  of  automatic  flight  control  technology  with  the  first  silicon  autopilots,  circa 
1958,  it  can  be  shown  that  the  electronic  features  of  that  era  became  liabilities  within  a few  years. 
Table  1 illustrates  the  evolution  of  equipment  features  through  six  generations  of  design  innovations 
that  have  occurred.  The  1958  systems  used  electromechanical  synchronizer/ integrator  computat ional  mod- 
ules,  building-block  circuits  in  the  form  of  amplifiers,  demodulators  and  modulators,  miniature  sealed 
relays  to  implement  the  various  interlock  functions,  and  external  gain  control  potentiometers  Inserted 
in  the  control  signal  path.  By  1960  to  1962,  improved  transistor  performance  and  miniaturization  re- 
sulted in  techniques  that  obsoleted  many  of  these  1958  features.  Device  miniaturization  and  reliability 
improvements  allowed  the  use  of  embedded  functional  modules  that  enhanced  maintainability.  Solid-state 
switching  began  to  replace  the  miniature  relays  despite  the  continued  improvement  and  miniaturization 
of  relays.  Crude  electronic  multipliers  eliminated  the  need  to  interrupt  the  control  signal  path  with 
remote  gain  control  potentiometers. 

TABLE  1 

SIX  GENERATIONS  OF  PROGRESS  IN  AUTOMATIC  FLIGHT  CONTROL  ELECTRONICS 


Generation  1 Time  Period 


Main  Improvement 


Equipment  Characteristics 


First  Silicon  Transistor 
Autopilots 

• Electromechanical  Instrument  Servo  Computing 
Elements 

• Building  Block  Circuits 

• Gain  Control  Potentiometers 

• Miniature  Sealed  Relays 

Planar  Transistors  - 
Miniature  Embedded  ^ 

Circuits 

• Electronic  Multipliers  Replace  Gain  Control 
Potent iometers 

• Limited  Introduction  of  Solid-State  Switching 
to  Replace  Relays 

• Miniature  Embedded  Building  Block  Circuit 
Modules 

DC  Computation  - Limited 
Microelectronics 

• Greatly  Improved  Computation  Accuracy 

1 

• Introduction  of  Solid-State  Synchronizers  Re- 
place Electromechanical  Computing  Elements 

T.*) 


TABLE  1 (cont ) 

SIX  GENERATIONS  OF  PROGRESS  IN  AUTOMATIC  FLIGHT  CONTROL  ELECTRONICS 


Generation 

Time  Period 

Main  Improvement 

Equipment  Character is t Ics 

4 

1966-1968 

All  Microelectronic 
Autopilots 

• Microelectronic  Subassembly  Functional  >todules 

• All  Solid-State  Switching 

5 

1969-1974 

MSI  - LSI  Components 

• Built-In  Digital  Data  Interfaces 

• Multi-Function  Monolithic  and  Hybrid  Devices 

6 

1974- 

Digital  Autopilots 

• General  Purpose  Digital  Computer  provides  all 
•Control  and  Logic 

By  1963  to  1965,  improved  transistor  stability  allowed  the  application  of  dc  computation  techniques 
to  usher  in  an  era  of  additional  miniaturization  resulting  from  the  elimination  of  suppressed  carrier 
data  handling  circuitry.  Also,  accuracy  improvements  as  great  as  10:1  over  previous  mechanizations  were 
achieved.  Almost  all  relays  could  now  be  eliminated,  solid-state  synchronizers  replaced  the  electromech- 
anical units,  and  a limited  number  of  microelectronic  circuits  were  introduced.  By  19h6  to  1968,  the  era 
of  the  all-microelectronic  autopilot  had  arrived.  A single  microelectronic  subassembly  module  10  cuolc 
centimeters  in  volume  could  perform  complete  signal  processing  and  control  law  computation  functions  with 
self-contained  digital  logic  and  gain  control  interfaces.  An  overview  of  this  evolution  is  summarized 
in  Figure  1.  Because  of  a change  in  the  electronic  techniques  available,  the  system  organization  moved 
toward  the  use  of  miniature,  self-sufficient  subassembly  computing  and  signal  processing  modules.  An  ex- 
ample of  how  this  simplified  system  organization  works  is  shown  in  Figures  2a  and  2b.  Tlie  earlier  systems 
required  complicated  routing  of  control  signals  in  and  out  of  the  computers  for  gain  controls  applied  by 
external  devices  (Figure  2a).  Indeed,  some  of  the  most  unreliable  designs  appeared  in  the  late  1950's, 
ironically  as  a result  of  Improved  frequency  domain  system  analysis  and  synthesis  techniques  based  on 
root  locus  S plane  design  criteria.  The  designers  attempted  to  program  control  laws  as  a function  of 
altitude,  Mach  number,  speed  and  aircraft  configuration  so  that  the  closed  loop  system  poles  always  re- 
mained in  narrow  regions  of  the  S plane.  Subsequent  Insights  recognized  that  an  aircraft's  disturbance 
and  command  responses  could  be  significantly  different  so  that  relatively  high  gain  control  augmentation 
systems  could  be  designed  with  very  non-critical  requirements  for  gain  programming.  The  unwieldy  com- 
plexity of  the  earlier  systems,  however,  with  their  complex  arrays  of  air  data  controlled  potentiometers 
were  a prime  motivation  for  the  enchantment  with  "adaptive  control"  as  a solution  to  these  unreliability 
problems  in  the  late  1950' s and  early. i960' s (Reference  1). 
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Figure  1 

Evolution  of  AFCS  Control  Computer  Organization  (1958  - 1968) 
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Figure  2a 

System  Gain  Control  Mechanization  with  Potentiometer 
Multipliers  in  Signal  Path  (1958) 
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Figure  2b 

System  Gain  Control  with  Internal  Electronic 
Multiplication  (1963  to  1966) 


In  the  later  systems,  as  exemplified  by  Figure  2b,  the  gain  control  means  are  self-contained  within 
the  signal  processing  and  computation  modules  which  accept  remote  gain  control  commands  as  well  as  sig- 
nal transmit  or  signal  inhibit  comnands. 


Another  important  electronic  innovation  that  affected  the  internal  organization  of  the  AFCS  com- 
puters was  the  replacement  of  the  electromechanical  Instrument  servomechanisms  with  all  solid-state  de- 
vices (Figures  3a  and  3b).  The  electromechanical  unfts  that  were  very  prevalent  in  the  1958  generation 
of  AFCS  equipment  provided  an  extremely  versatile  computational  and  data  handling  capability.  As  illus- 
trated schematically  in  Figure  3a,  they  can  be  used  for  synchronization,  integration,  and  filtering 
functions  as  well  as  providing  a means  for  mechanically  activated  limiting  and  level  detection  functions. 
Mechanical  devices  that  were  included  in  electronic  computers,  however,  fell  into  poor  repute  in  the 
early  1960's.  Improved  reliability  and  maintainability  objectives  motivated  the  development  of  solid- 
state  replacements.  While  the  electromechanical  device  could  perform  a variety  of  functions,  only  the 
synchronizer  function  was  difficult  to  reproduce  economically  in  a solid-state  mechanization.  Since  it 
required  a long-term,  drift-free  memory,  the  first  versions  used  analog-to-digital  converter  registers 
or  counters  to  store  the  information,  and  digital-to-analog  converters  to  complete  the  synchronizer 
signal  processing  loop.  Although  such  conversion  devices  became  commonplace  by  the  mid-1970’s  as  a re- 
sult of  progress  in  monolithic  integrated  circuits,  in  the  mid-1960's  these  converters  were  large  and 
expensive.  They  showed  little  Improvement  in  size  and  power  consumption  over  miniature  electromechan- 
ical units,  and  resulted  in  a definite  cost  penalty.  They  were  used  only  to  improve  reliability.  Were 
the  electromechanical  units  really  so  unreliable?  The  answer  depends  upon  the  design  application  and 
the  quality  of  the  maintenance.  This  theme  recurs  as  we  examine  the  reliability  history  of  all  types 
of  components  used  in  flight  control  systems.  The  same  type  of  device  commonly  gives  one  to  two  orders 
of  magnitude  difference  in  field  MTBF  because  of  design  deficiency  and/or  poor  maintenance.  Since  reli- 
ability of  electromechanical  computing  units  are  so  sensitive  to  poor  maintenance,  their  replacement 
with  solid-state  devices  has  generally  been  applauded.  By  the  late  1960's  however,  the  development  of 
extremely  high  Impedance  MOS  amplifiers  in  conjunction  with  special  capacitors  permitted  the  achievement 
of  analog  memories  that  actually  replaced  the  digital  mechanizations.  Thus,  a typical  solid-state 
synchronizer- Integrator  function  of  1968,  as  Illustrated  in  Figure  3b,  consisted  of  an  analog-hold  cir- 
cuit, an  operational  amplifier  Integrator  circuit,  a multiplier  circuit  for  line  voltage  compensation, 
and  switching  logic  to  control  the  signal  inputs  to  the  integrator  and  the  analog-hold  memory. 


FUNCTIONAL 

CAPABILITY 

• SYNCHRONIZER 

• Filter 

• INTEGRATOR 


Figure  3a 

Electromechanical  Instrument 
Computing  Element  (1958) 
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Figure  3b 

Solid  State  Replacement  for  Electromechanical 
Computing  Element  (1963) 

The  impact  of  these  electronic  advances  on  equipment  size  is  illustrated  In  Figure  4.  A typical 
1958  subassembly  that  required  about  930  cubic  centimeters  of  circuit  cards  had  its  functions  repro- 
duced in  1968  with  two  microelectronic  subassembly  embedded  modules  having  a volume  of  less  than  30 
cubic  ceotimeters.  By  1973,  these  same  functions  could  be  incorporated  in  two  hybrid  modules  having  a 
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volume  of  approximately  one-fifth  to  one-tenth  the  volume  consumed  by  the  1968  design.  The  1958  start- 
ing point  on  Figure  4 is  a large  plug-in  module  from  the  autopilot  used  on  the  U.S.  Navy  A-6  aircraft. 

The  1962  module  is  a control  axis  for  a missile  autopilot.  The  1968  units  are  used  in  the  autopilot  for 
the  Boeing  747  transport  and  the  1973  devices  are  used  in  the  automatic  flight  controls  for  the  U.S.  Air 
Force  B-1. 

3.0  R£VI£W  OF  SENSOR  PROGRESS  IN  AUTOMATIC  FLIGHT  CONTROL  SYSTEMS 
A,  Overview  of  Sensors 

Progress  and  trends  in  flight  control  sensors  are  not  as  easy  to  track  as  the  electronics.  System 
partitioning  has  been  changing  in  the  past  two  decades  so  that  sensors  that  were  once  part  of  the  flight 
control  systems  may  now  be  Included  within  other  avionics  subsystems.  The  vertical  gyroscope  was  once 
the  basic  autopilot  sensor.  The  attitude-heading  reference,  which  may  still  use  a vertical  gyroscope, 
remains  a primary  reference  for  all  transport  aircraft  today  but  these  reference  subsystems  are  no  longer 
included  as  part  of  the  autopilot.  By  the  mld-1950*s,  autopilots  based  on  the  single  degree  of  freedom, 
rate  measuring  gyroscope  found  their  way  into  all-attitude  combat  aircraft  applications.  By  the  1960*s, 
the  rate  gyro  became  a key  sensor  for  all  automatic  flight  control  type  systems.  It  was  used  in  stabil- 
ity augmentors,  rate  command  augmentation  systems  and  as  inner  loop  damping  devices  for  attitude 
autopilots. 

Air  data  sensors  are  another  example  of  devices  which  were  once  part  of  the  automatic  flight  con- 
trols but  eventually  grew  into  Independent  subsystems.  The  autopilots  of  the  1950's  contained  pitot  and 
static  source  plumbing  connections.  They  measured  static  pressure  changes  from  a reference  altitude  for 
use  in  altitude  hold  cruise  modes.  They  also  measured  airspeed  for  use  in  control  law  gain  adjustment. 

As  transport  autopilot  guidance  modes  grew  in  sophistication  to  include  vertical  speed  control,  airspeed 
hold  and  select,  Mach  hold  and  select  and  altitude  pre-select,  the  air  data  devices  contained  within  the 
autopilot  grew  into  veritable  air  data  computers.  Indeed,  in  such  aircraft  as  the  DC-6  and  DC-9,  the 
air  data  computations  performed  by  the  autopilot  were  expanded  into  air  data  computers  that  supplied 
display  and  control  information  to  other  subsystems  as  well  as  to  the  autopilot.  In  most  cases  however, 
the  aircraft's  air  data  requirements  wr^e  consolidated  into  separate  air  data  computers  which  supplied 
flight  controls  and  other  systems  with  their  required  data. 

The  subject  of  functional  partitioning  is  important  if  we  wish  to  enquire  into  reliability  trends 
in  flight  control  systems  because  statistical  MTBF  data  for  different  flight  control  systems  may  or  may 
not  include  the  sensors  within  the  flight  controls.  For  example,  a review  of  USAF  Manual  66-1  reliabil- 
ity data  shows  that  a vertical  gyroscope  is  a significant  failure  contributor  to  some  flight  control 
systems  (References  2,  3),  but  airline  data  on  flight  control  system  reliability  generally  separates 
vertical  gyroscope  failure  data  from  that  of  the  automatic  flight  controls.  A similar  observation  may 
be  made  regarding  air  data  devices.  Nevertheless,  it  would  be  informative  to  review  the  technology 
trends  in  flight  control  sensing  devices  from  the  standpoint  of  their  impact  on  reliability. 

Table  2 summarizes  the  various  sensing  devices  used  by  flight  control  systems.  The  following  is  a 
brief  review  of  trends  in  each  type. 

TABLE  2 

FLIGHT  CONTROL  SENSORS 

Sensor  Type  Flight  Control  Function 


Rate  Gyroscope  Body  axis  pitch,  roll  and  yaw  rate  measurement  - used  for  stability  augmenta- 

tion systems,  command  augmentation  and  source  of  damping  for  attitude 
autopilots 


Displacement 
Gyroscope  or  Platform 

— 
Attitude  reference  (vertical  gyro)  pitch  and  roll  attitude;  heading 
reference. . . (Magnetically  slaved  directional  gyroscope) 

Linear  Accelerometers 

• Lateral  - turn  coordination 

• Normal  - pitch  command  augmentation  systems  (blended  with  pitch  rate);  iner- 
tial smoothing  of  air  data  derived  vertical  speed 

• Longitudinal  - inertial  smoothing  of  airspeed  measurements  for  autothrottle 
controls 

Angular  Accelerometers 

Integrated  to  obtain  body  axis  rate  measurement  - replace  rate  gyros 

Stick  Force  Sensors 

Pilot  control  input  for  command  augmentation  and  control  wheel  steering  systems 

Air  Data  Sensors  and 
Computers 

• Altitude,  altitude  deviation,  airspeed,  vertical  speed  and  Mach  number  for 
vertical  guidance  modes 

• Airspeed  and  Mach  for  gain  programming 

Miscellaneous  Posl- 
tlon  Sensors 

Position  measurement  of  primary  control  surfaces,  stabilizer  trim,  flaps,  wing 
position,  etc  using  LVDTs,  synchros;  for  control  law  adjustment,  gain  control, 
mode  control,  etc 

Miscellaneous  Guid- 
ance Sensors 

1 

ILS,  Tacan,  VOR,  DME,  Radio  Altimeter,  etc  for  flight  path  guidance  modes  ! 

B.  Attitude  Rate  Sensing 

1 .  Rate  Gyros 
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The  single  degree  of  freedom,  spring-restrained  rate  gyroscope  used  for  control  applications  has  not 
changed  much  In  the  past  15  years.  The  miniature  gyro,  fully  or  partially  floated  In  a viscous  fluid 
was  Introduced  In  the  late  1950’ s.  The  1976  devices  are  about  the  same  size  or  only  slightly  smaller 
than  the  1958  units.  The  main  difference  Is  in  the  self-test  Improvements  which  have  been  incorporated 
(precise  torque  coils  and  wheel  speed  monitors).  The  technology  emphasis  has  probably  been  directed  at 
cost  reduction  rather  than  on  reliability  Improvement  although  there  have  been  some  notable  reliability 
advances.  Rate  gyros  with  some  penalty  in  size  are  available  with  demonstrable  MTBFs  of  50,000  hours 
prior  to  the  onset  of  an  increased  failure  rate  probability  resulting  from  wear  out.  To  operate  at  the 

50.000  >fIBF  level  at  all  times  requires  preventive  maintenance  after  10,000  or  20,000  hours.  Such  a 
preventive  maintenance  requirement,  however,  is  viewed  as  a serious  liability.  Indeed,  maintenance  on 
any  mechanical  device,  especially  a delicate  gyroscopic  device  is  a key  weakness  In  the  reliability  of 
ail  flight  control  systems.  In  U.S.  Air  Force  experience  (References  2,  3),  large  discrepancies  in  the 
attained  MTBF  of  a given  part  can  be  attributed  to  the  quality  of  the  maintenance  actions.  Typical  rate 
gyros  in  USAF  inventory  yield  closer  to  600  hours  MTBF  than  50,000  hours. 

2.  Derived  Rate 

The  acknowledged  problems  of  maintenance  and  the  historically  poor  reliability  achieved  by  gyro- 
scopic devices  stimulated  a search  for  other  techniques  of  aircraft  angular  rate  measurement.  One  par- 
tial solution  to  Che  elimination  of  the  rate  gyroscope  is  a return  to  the  techniques  used  two  decades 
ago.  The  autopilots  of  the  early  1950' s derived  pitch  and  roll  rate  electronically  from  the  displacement 
gyroscope's  measurement  of  the  pitch  and  roll  Euler  angles  with  respect  to  a local  vertical  coordinate 
frame.  Systems  still  in  use  in  the  B-52  and  C-130  aircraft  are  vestiges  of  that  era.  When  autopilot 
system  gains  and  control  authorities  were  increased,  the  resultant  expansion  of  the  closed  loop  system 
bandwidth  exposed  a basic  defect  of  the  derived  rate  technique.  Excitation  of  the  vertical  gyro  synchro 
transducer  was  obtained  from  the  aircraft's  400-Hz  power  supply.  Poor  regulation  of  that  supply  and 
especially  voltage  modulation  in  the  1.0  to  10.0  Hz  frequency  range  would  be  amplified  by  the  derived 
rate  circuitry.  When  pitch  and  roll  attitude  excursions  were  large  this  problem  was  most  severe.  It 
manifested  itself  as  control  surface  jitter,  and  with  parallel  servos,  the  shaking  of  the  column  and 
wheel  would  be  quite  objectionable.  Various  techniques  were  employed  to  minimize  this  problem  such  as 
electromechanical  instrument  servos  of  the  type  illustrated  previously  in  Figure  3a.  These  instrument 
servos,  in  rather  elaborate  combinations,  were  used  to  accomplish  signal  summation  electromechanically 
so  that  the  400-Hz  suppressed  carrier  signal  that  was  susceptible  to  power  supply  noise  was  always  at  a 
null.  Since  the  noise  was  proportional  to  the  magnitude  of  that  signal,  effective  noise  suppression 
could  be  achieved  by  working  around  a null  at  all  times. 

By  the  early  196t)'s,  the  need  to  acquire  attitude  data  from  some  platform  configurations  that  might 
interject  a servo  follow-up  unit  between  the  gyroscopic  element  and  the  autopilot  pitch  and  roll  signal 
discouraged  the  derived  rate  approach  and  most  systems  resorted  to  the  single  degree  of  freedom  rate 
gyroscope.  By  the  latter  1960’s,  however,  technology  improvements  arrived  to  restore  the  feasibility  of 
the  derived  rate  approach.  Not  only  did  improved  constant  frequency  alternators  make  up  for  most  of  the 
deficiencies  of  the  older  motor  driven,  poorly  regulated  inverters,  but  the  advent  of  microelectronic, 
operational  amplifier  computation  techniques  made  line  voltage  multipliers  a practical  compensation  tech- 
nique for  power  supply  noise.  Flight  control  systems  In  the  DC-10  and  L-1011  aircraft,  for  example,  use 
derived  rates  for  pitch  and  roll  stabilization.  The  MTBF  of  the  derived  rate  circuitry  is  between 

200.000  and  1,000,000  hours,  which  1«  considerably  better  than  one  could  obtain  by  adding  any  candidate 
rate  sensor.  (It  is  noted  that  the  attitude  reference  which  is  the  source  of  the  derived  rate  has  a 
much  lower  MTBF  than  the  rate  circuit  electronics,  but  loss  of  an  attitude  reference  shuts  down  the  con- 
trol channel  even  if  that  channel  had  an  independent  source  of  attitude  rate.) 

3.  Vibratory  Gyros 

In  the  last  decade,  much  work  has  beei*  done  to  develop  a rate  sensor  that  does  not  depend  upon  a 
high  speed  wheel.  Since  the  announcement  of  the  vibratory  gyro  In  the  1940's  and  early  1950's  (Refer- 
ences 4,  5),  designers  have  worked  to  develop  practical  devices.  Although  not  literally  in  the  non- 
moving  part  category,  vibratory  devices  featuring  the  elimination  of  the  high  speed  wheel  offer  a poten- 
tial MTBF  of  50,000  to  100,000.  They  suffer  from  poor  null  characteristics  but  in  such  applications  as 
yaw  dampers  where  high  pass  filters  in  the  control  law  are  used  to  exclude  the  steady-state  yaw  rate, 
the  poor  null  deficiency  can  be  tolerated.  Although  there  have  been  no  widespread  applications  of  vib- 
ratory rate  sensors,  they  have  been  demonstrated  (References  6,  7,  8)  and  might  find  use  in  future 
systems. 

4.  Angular  Acceleration  Measurement 

In  addition  to  derived  rate  techniques,  the  main  competitor  to  the  vibratory  gyros  for  reliable, 
no  high  speed  wheel,  rate  sensing  is  the  angular  accelerometer.  Practical  versions  of  such  instruments 
are  finding  their  way  into  airline  transport  autopilots.  Autopilots  for  the  DC-9-50  aircraft  now  Incor- 
porate angular  accelerometers.  Some  I'll  aircraft  have  been  similarly  equipped.  These  devices  suffer 
from  an  Inability  to  accurately  define  the  zero  or  null  attitude  rate  but  In  their  applications  thus 
far,  they  are  used  where  the  attitude  rate  can  be  compensated  through  a high  pass  filter  (washout).  It 
would  be  difficult  to  apply  these  devices  to  rate  command  augmentation  systems,  where  good  definition  of 
zero  rate  point  Is  essential. 

Angular  acceleration  measurement  and  Integration  of  the  resulting  signal  to  prodvice  attitude  rate 
Is  a technique  that  has  been  used  for  more  than  two  decades.  Rather  than  employ  an  angular  acceler- 
ometer, which  is  an  extremely  difficult  instrument  to  build  In  a small  package,  pairs  of  linear  accel- 
erometers were  used.  (Linear  accelerometers  summed  differentially  will  read  the  angular  acceleration  of 
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the  line  Joining  them  - the  larger  the  separatlon»  the  higher  the  effective  angular  acceleration  signal 
gradient.)  A yaw  damper  mechanized  in  this  way  was  used  in  the  U.S.  Navy  A-3D  aircraft  In  the  mld-1950's 
and  all  the  autopilots  in  the  DC-8  aircraft  measure  pitch,  roll  and  yaw  angular  acceleration  in  this 
manner.  In  addition  to  the  DC-8  aircraft,  the  Convair  990  and  880  aircraft  were  equipped  with  similar 
3-axis  angular  acceleration  based  attitude  stabilization  systems  (References  9,  10,  11).  The  linear  ac- 
celerometers used  in  these  applications  were  of  the  simplest  open  loop  type  consisting  only  of  an 
E-plckoff  and  a pair  of  flexure  springs  supporting  the  sensitive  mass.  It  is  difficult  to  find  any  record 
of  removal  of  these  accelerometers  in  the  last  ten  years  and  it  is  estimated  that  their  MTBFs  are  in  ex- 
cess of  100,000  hours. 

C.  Attitude  Sensors 

The  trends  in  attitude  sensor  size  and  weight  have  not  followed  the  electronic  trends  in  regard  to 
miniaturization.  In  the  last  two  decades,  gyroscopic  technology  advances  were  in  the  area  of  performance 
improvements  required  for  inertial  navigation  rather  than  for  flight  controls.  In  the  early  1950*s,  a 
vertical  gyroscope  that  provided  roll  and  pitch  references  for  an  autopilot  weighed  about  3.2  kg.  A mod- 
ern day,  wide-bodied  transport  aircraft  that  uses  a vertical  gyroscope  rather  than  an  inertial  platform 
as  its  source  of  attitude  data  has  not  enjoyed  any  apparent  benefits  ot  miniaturization  because  its  ver- 
tical gyroscope  unit  weighs  about  6.8  kg.  The  unit,  however,  is  packed  with  electronics  for  monitoring, 
signal  isolation,  erection  control  and  other  new  features  which  are  demanded  by  the  more  sophisticated 
avionics  systems  of  the  1970* s.  Reliability  of  such  units  is  very  dependent  upon  the  quality  of  mainten- 
ance. U.S.  Air  Force  data  in  References  2 and  3 reveal  vertical  gyro  MTBFs  ranging  from  250  to  900  hours 
and  these  are  for  gyros  that  are  considerably  less  complex  than  the  type  used  in  modem  transport  air- 
craft. Historically,  inertial  platforms  have  shown  a much  poorer  MTBF  than  the  vertical  gyroscope.  The 
airline  experience  with  vertical  gyros  Indicate  an  MTBF  ranging  from  4,000  to  10,000  hours  depending  upon 
maintenance  quality  and  method  of  calculating  KTHF  (Reference  12).  There  is  some  controversy  regarding 
the  calculation  of  IfTBF  for  gyroscopic  devices  on  the  basis  of  flight  hours  or  operating  hours.  Typical 
airline  experience  shows  a ratio  of  operating  hours  to  flight  hours  of  greater  than  2:1.  (Higher  values 
of  hTTBF  are  seen  when  operating  hours  rather  than  flight  hours  are  used  in  the  calculation.)  It  is  noted, 
however,  that  most  reliability  data  on  modem  day  automatic  flight  control  systems  do  not  include  atti- 
tude reference  failures  as  part  of  the  flight  controls. 

D.  Accelerometers 

Modem  units  are  miniature  designs  of  the  feedback  type  in  which  integral  electronics  control  the 
torquer  which  captures  the  sensitive  mass  and  processes  the  signal  that  measures  the  acceleration.  Reli- 
ability histories  obtained  from  field  data  are  often  inconclusive  because  of  problems  which  may  have  been 
unrelated  to  the  accelerometer  mechanism  but  caused  by  the  application.  Manufacturers  of  accelerometers 
are  willing  to  guarantee  100,000  hour  MTBFs  but  quite  often  the  associated  electronics  detract  from  the 
intrinsic  reliability  of  the  accelerometer  mechanism.  Nevertheless,  the  reliability  trend  in  acceler- 
ometer technology  for  flight  control  applications  is  favorable. 

E.  Stick  Command  Sensors  (Force  or  Position) 

These  types  of  sensors  which  are  used  for  control  augmentation  or  control  wheel  steering  inputs  to 
control  computers  are  mechanized  today  with  strain  gauge  transducers  or  LVDT  transducers.  The  transducer 
devices  themselves  have  MTBFs  of  well  over  100,000  hours.  The  associated  electronics  and  sometimes  in- 
stallation design  deficiencies  which  aggravate  alignment  and  redundant  channel  tracking  accuracies  are 
the  usual  detractor  from  the  100,000  to  500,000  hour  MTBF  potential. 

F.  Air  Data  Sensors 

Uhen  air  data  sensors  are  used  exclusively  for  flight  control  computation,  their  relative  simplicity 
has  resulted  in  typical  MTBFs  of  5000  to  10,000  hours  for  the  complete  set  of  air  data  computations  used 
by  a sophisticated  autopilot  system.  However,  when  the  air  data  computations  are  centralized  so  that  a 
central  computer  provides  all  the  air  data  needs  of  the  avionics  computers,  displays  and  flight  controls, 
then  the  growth  in  complexity  has  led  to  a decline  in  MTBF  to  typical  values  which  are  below  1000  hours 
for  commercial  aircraft  and  less  than  one  half  that  value  for  some  military  applications.  The  trend  to 
digital  air  data  computers  which  eliminate  the  complex  electromechanical  computing  and  calibration  de- 
vices that  typified  the  analog  air  data  computers  of  the  1960's  has  reversed  this  trend.  Newer  central 
air  data  cooiputers  can  be  expected  to  yield  MTBFs  in  the  1500  to  2500  range,  depending  upon  complexity 
and  maintenance  factors. 

G.  Miscellaneous  Guidance  Sensors 

In  the  compilation  of  system  reliability  data,  these  devices  are  normally  included  In  categories 
other  than  flight  controls.  Nevertheless,  most  contemporary  automatic  flight  control  systems  for  trans- 
port aircraft  Include  many  guidance  modes  such  as  the  automatic  approach  and  landing  functions  which  de- 
pend upon  such  sensing  elements  as  VHF  Nav  receivers  (VOR,  localizer,  glldeslope),  radio  altimeters  and 
the  various  controllers  and  data  entry  devices  and  panels  associated  with  setting  up  the  navigation/ 
guidance  problem.  This  group  of  devices  tends  to  have  operational  MTBFs  which  are  of  about  the  same  n»g- 
nitude  as  the  autopilot  computers  (Reference  13,  for  example).  Consequently,  the  failure  of  these  radio 
devices  and  associated  controllers  can  reduce  the  probability  of  equipment  availability  for  an  automatic 
landing  to  about  the  same  extent  as  It  is  reduced  by  a malfunctioning  autopilot  computer.  In  recogni- 
tion of  this  fact,  there  have  been  recommendations  that  simplified  landing  system  receivers  be  developed 
for  exclusive  use  of  the  autoland  system. 
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4.0  SAFETY  AMD  REDUNDAMCY  AMD  THElft  IMSATlAMLE  APPCTITC  FOft  ELECTRONICS 
A.  S«f«ty  Monitors 

With  ch€  progress  Id  Dlnlstur  Isst  loo  shu«m  In  Figure  » ->ne  Is  led  to  Inquire  ho%^  the  overall  auto- 
matic flight  control  syatems  have  fared  lo  regard  sKv.  There  Is  little  basis  for  comparison  with 
the  earlier  systems  because  the  scope  of  functioos  provided  bv  the  sutiimaclr  flight  control  systems  has 
been  expanding  at  a continuously  Increasing  rate-  This  csiaiatluo  In  funitlonal  scope  appears  to  be  a 
regenerative  process  whereby  each  new  funitlon  st«ws  the  seeds  for  a future  gn^wth  requirement.  Let  us 
briefly  review  how  this  situation  came  to  pass. 

Early  autopilots  were  pilot  relief  devices.  They  were  allowed  an  absolute  mlnisiuin  of  control  auth- 
ority so  that  they  could  be  overpowered  by  the  pilot  with  ease  and  would  not  cause  any  serious  disturb- 
ance if  they  failed  hardovcr.  As  their  potential  for  aircraft  stabll Irat Ion  and  precise  flight  path  con- 
trol was  recognized,  it  was  apparent  that  performance  could  be  Improved  with  increased  static  and  dynamic 
authority.  This  led  to  a conflict  with  safety  considerations  and  the  protective  devices  sometimes  re- 
ferred to  as  Safety  Monitors  arrived  on  the  scene  in  the  early  1950's.  Figure  5a  illustrates  how  the 
simple  mechanization  of  an  acceleration  limit  detector  served  as  a protective  device  for  an  autopilot. 

The  criterion  of  failure  ie  excessive  acceleration,  1.2  Incremental  g for  example.  If  the  autopilot 
fails  hardover,  the  monitor  disengages  the  system  when  1.2g  are  reached.  This  system  has  a natural  in- 
clination to  grow  more  complex  because  It  does  not  do  a very  good  Job.  If  we  wait  until  1.2g  are  reached 
Co  trip  the  iDonlCor,  we  may  reach  2.2g  by  the  time  disengagement  occurs.  What  If  normal  control  man- 
euvers reach  1.2g?  We  do  not  wish  to  have  nuisance  disengagement  so  we  measure  the  command  and  modify 
the  g threshold  accordingly.  What  If  the  1.2g  occur  because  of  turbulence  transients?  We  can  compen- 
sate for  this  by  measuring  and  computing  %rtiecher  the  autopilot  is  correcting  for  the  error  or  aggravat- 
ing the  error.  This  leads  to  the  more  sophisticated  Safety  Monitor  of  Figure  5b,  where  additional  sen- 
sors and  monitor  points  provide  the  needed  Intelligence  for  failure  detection. 
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Figure  5a 

Simple  Safety  Monitor 
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Figure  5b 

More  Sophisticated  Safety  Monitor 
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B.  The  Dual,  Fall-Passive  Systems 

Soon  the  requirements  for  the  Safety  Monitor  become  more  stringent.  Is  a transient  as  high  as  1.2g 
satisfactory,  especially  If  the  aircraft  is  difficult  to  handle  or  flying  near  the  ground?  Can  we  re- 
duce the  failure  transient  to  .5g,  .1g,  or  .05g?  The  answer  is  always  yes  If  we  are  willing  to  build  a 
more  sophisticated  monitor.  The  ultimate  evolution  of  the  Safety  Monitor  is  the  Dual  Fail-Passive  Sys- 
tem shown  in  Figure  6.  It  determines  whether  the  autopilot's  control  action  is  correct  by  sensing  the 
same  information,  performs  the  same  computation  and  commands  a response  of  its  own.  It  does  this  auto- 
nomously of  the  original  control  channel.  It  is,  in  effect,  a complete  duplication  of  the  single  channel 
autopilot.  To  assure  a minimum  failure  transient,  the  output  of  the  monitor  channel  can  drive  a physical 
servo  that  actually  opposes  the  failure  in  the  other  channel.  The  criterion  of  failure  detection  is  a 
disagreement  of  the  two  outputs.  The  response  to  a failure  detection  is  total  disengagement.  What  have 
we  created?  In  the  interest  of  safety,  we  have  introduced  redundancy  and  have  thereby  reduced  reliabil- 
ity by  a factor  of  greater  than  cwo-to-onel  Here  is  an  enigma  that  contradicts  the  instinctive  feeling 
that  reliability  and  safety  are  consequences  of  each  other. 

Now  let  us  examine  some  of  the  technological  consequences  of  the  monitoring  concept  illustrated  in 
Figure  6.  The  main  design  challenge  relates  to  establishing  the  failure  threshold  criteria.  Should 
the  channel  comparator  be  set  to  indicate  a failure  for  1.0,  5.0,  20  or  50  percent  discrepancy  between 
channels?  Obviously,  the  answer  depends  upon  the  normal  tolerances  of  a control  channel.  If  we  wish 
to  minimize  failure  transients,  the  comparator  trip  levels  must  be  set  for  small  discreoancies.  This 
necessitates  tight  channel  tracking  and  high  accuracy.  Indeed,  this  is  the  only  reason  for  high  accuracy 
in  an  automatic  flight  control  system.  The  control  law  of  a closed  loop  control  process  can  usually  vary 
±30  percent,  statically  and  dynamically,  with  little  change  in  performance.  Accuracy  in  the  control 
process  is  defined  by  the  sensitivity  of  the  sensors  and  the  absence  of  null  offsets  in  the  sensors  and 
control  electronics.  However,  for  accurate  monitoring,  an  order  of  magnitude  increase  In  the  precision 
of  the  electronics  is  required.  The  advances  in  analog  electronic  technology  during  the  1960*s  per- 
mitted the  attainment  of  this  type  of  accuracy  improvement. 
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Figure  6 

The  Safety  Monitor  Grows  to  the  Dual 
Fall-Passive  System 


C.  The  Fall-Operative  Systems 

Concurrent  with  the  evolution  of  a monitoring  channel  into  a completely  redundant  channel,  there 
were  growing  demands  for  increased  mission  reliability.  An  aircraft  that  Is  committed  to  automatic  con- 
trol during  flight-critical  situations  cannot  tolerate  the  shutdown  of  the  system  in  the  event  of  a 
malfunction.  In  such  automatic  control  modes  as  low  level  terrain  following  or  all  weather  automatic 
landing,  shutdown  of  the  system  even  for  a short  instant  is  not  permissible.  The  need  for  proven  solu- 
tions to  this  type  of  problem  has  been  the  principal  deterrent  to  the  introduction  of  fly-by-wire  con- 
trols and  their  acknowledged  advantages.  Hence,  the  need  for  the  fall-operative  or  fault  correcting 
system  arose.  Its  simplest  mechanization  from  a conceptual  point  of  view  Is  the  direct  growth  from  the 
dual  fall-passive  system  of  Figure  6.  By  taking  two  dual  fail-passive  systems,  we  have  the  brute  force 
growth  to  the  dual-dual  fall-operative  system  of  Figure  7a.  Either  channel  A or  B can  perform  the  con- 
trol task  and  each  channel  monitors  Itself  with  its  redundant  monitor.  If  channel  A falls,  it  shuts 
down,  gracefully  recenterlng  Itself,  and  permits  channel  B to  continue  the  control  task.  There  are  a 
number  of  complications  suggested  in  Figure  7a.  First,  one  might  not  wish  to  build  quad  redundant  hydro- 
mechanical  servos  because  of  weight  and  hydraulic  power  redundancy  considerations  (althoueh  such  sys- 
tems are  operational  today).  Thus,  electronic  models  of  the  servo  are  often  used  for  monitoring  pur- 
poses. Also,  two  completely  autonomous,  active  control  channels  cannot  control  the  same  aircraft  with- 
out certain  types  of  run-away  problems.  The  two  channels  must  be  equalized  or  slaved  to  each  other. 

This  la  a dangerous  requirement  because  it  can  permit  cross-channel  failure  propagation. 


j 


i 


In  principle.  It  should  not  require  four  channels  to  produce  a fall-operative  system.  A common 
monitor  channel  can  be  used  to  identify  the  malfunctioned  channel  by  a two  out  of  three  voting  procedure. 
Figure  7b  shows  this  triple  redundant  fail-operative  configuration,  but  it  also  shows  another  Interesting 
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Innovation  that  has  been  used  in  the  design  of  such  systems.  The  input  to  each  servo  or  servo  model 
channel  is  always  a valid  control  signal  and  is  always  identical  in  all  channels.  This  feature  is  pro- 
vided by  a circuit  identified  as  a voter  in  Figure  7b.  It  is  often  referred  to  as  a Mid  Value  Signal  Se- 
lector circuit.  It  only  transmits  the  middle  value  of  the  three  channel  signals.  If  one  channel  fails 
hardover,  that  failure  would  be  completely  suppressed  by  the  mid-value  circuit.  Such  a device  permits 
some  relaxation  in  tolerance  tightness  because  it  acts  as  a signal  collection  node  that  eliminates  all 
tolerance  errors  up-stream.  The  down-stream  monitoring  of  the  servo  system,  therefore,  need  only  be  con- 
cerned with  servo  system  tolerances.  Also,  failure  detection  circuitry  up-stream  of  the  voting  node  may 
have  wider  thresholds  because  the  failure  suppression  is  inherently  and  instantaneously  provided  by  the 
voting  circuit.  But  a note  of  caution:  Any  node  that  represents  a convergence  of  multichannel  data  is 

vulnerable  to  cross  channel  failure  propagation  which  can  allow  a single  failure  to  wipe  out  a multichan- 
nel system.  These  nodes,  therefore,  must  be  properly  buffered,  monitored,  tested,  and  treated  with  the 
utmost  care. 

The  problem  of  safety  at  the  multi-channel  convergence  nodes  becomes  even  more  critical  in  the  so- 
2 

called  (fail-operative)  or  double  fault  correction  system.  After  any  first  failure,  it  must  continue  to 
operate  normally  and  reject  any  second  failure  without  transient  disturbances  or  performance  degradation. 
In  the  latter  1960's,  development  programs  for  such  systems  were  underway.  The  motivation  was  fly-by- 
wlre.  Two  programs  typified  how  the  analog  technology  of  the  late  1960's  coped  with  the  (fail-operative)' 
requirement  for  military  and  commercial  operations:  The  USAF  680J  Survivable  Flight  Control  System  in 

the  F-4  aircraft  (Reference  14)  and  the  U.S.  Supersonic  Transport  which  was  being  built  by  Boeing  (Ref- 
erence 15).  The  complexity  of  the  electronic  mechanizations  in  these  two  programs  was  largely  dictated 
by  safety  considerations.  Where  safety  constraints  were  more  severe,  the  electronic  solution  resulted  in 
greater  complexity  (and  hence  reduced  reliability  in  the  sense  that  the  probability  of  equipment  failures 
had  increased).  Reference  16  shows  how  the  more  stringent  requirement  for  protection  of  the  signal  vot- 
ing node  in  the  SST  system  resulted  in  a voter  circuit  that  used  four  times  as  many  components  as  was 
required  to  mechanize  the  voting  circuit  used  in  the  F-4  Survivable  Flight  Control  System.  This  complex- 
ity growth  is  in  addition  to  a factor  of  2 to  3 increase  in  the  complexity  of  a 4-channel  voter  circuit 
over  a 3-channel  voter. 


D.  Complexity  Growth  in  Analog  Fail-Operative  Systems 


The  subject  of  the  voter  circuitry  is  Important  because  it  became  a key  issue  in  the  design  philoso- 
phies which  were  Incorporated  into  the  fail-operative  type  systems  being  developed  in  the  latter  1960*s. 
The  voting  or  signal  selection  circuit  is  a generalization  of  the  majority  logic  algorithm  Illustrated  in 
Figure  8a  for  three  discrete  inputs  A,  B,  and  C.  The  digital  majority  logic  mechanism  is  usable  directly 
as  a signal  selector  when  the  signal  is  transmitted  as  pulse  width  modulation  (F-111  Automatic  Flight 
Control  System,  for  example).  In  the  more  typical  case  of  contemporary  analog  control  systems,  the  sig- 
nal is  dc  or  suppressed  carrier  ac.  In  such  cases,  a circuit  which  selects  the  most  positive  signal  is 
substituted  for  the  "or"  gate  and  a circuit  that  selects  the  most  negative  signal  is  substituted  for  the 
"and"  gate.  When  there  are  four  channels,  several  other  arrangements  of  the  same  elements  allow  the  se- 
lection of  one  of  the  two  middle  values.  One  such  arrangement  is  shewn  in  Figure  8b.  That  version, 
mechanized  with  nine  amplifiers  plus  additional  circuitry  is  used  as  a most  negative  mid-value  signal  se- 
lector in  the  L-1011  flight  control  system.  Circuits  that  perform  this  type  of  signal  selection  appear 
to  be  ideal  methods  of  combining  the  signals  from  redundant  sensing  or  computation  channels.  They  should 
have  allowed  the  mechanization  of  system  architectures  that  employ  sectionalized  redundancy  to  enhance 
mission  success.  For  example,  consider  a quad  redundant  system  consisting  of  4 sensors,  each  having  a 
failure  probability  of  , and  4 computers,  each  having  a failure  probability  of  Using  the  binomial 

distribution  formula  (Reference  17): 
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where 


r • number  of  allowable  failures  (3  for  quad  redundant  system) 
n • number  of  channels  (4  for  Quad  System) 

P ■ probability  of  individual  channel  success 
Q ■ probability  of  Individual  channel  failure  » 1 - P 
It  Is  seen  that  cross  strapping  the  individual  sensors  to  each  computer  yields  a P^  of: 
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while  connecting  each  sensor  to  only  Its  associated  computer  channel  yields  a P^  of : 

,4 


(Pp) 


(Q,  ^ Q^) 


ISOLATED 


MAXIMUM 

SELECTOR 

— 

' MINIMUM 

1 SELECTOR 

D"(A  + B»«(AvC)»lB*C» 
FOR  DISCRETE  LOGIC 

D = MIN  < PmaX  lA.Bl]  [max  (A.cij  [ 

MAX  (B,C)1  > 

FOR  analog  mid  VALUE  SELECTION 

J 

46  5^20  20  0b  12 

Figure  8a 

Generalization  of  Majority  Logic  to  Analog 
Mid  Value  Selection 
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Figure  8b 

Typical  4-Channel  Mid  Value  Voter 
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If  the  sensor  and  computer  have  equal  failure  rates  (Q^  - Q^)  then  the  ratio  of  isolated  channels  to 

cross  strapped  channel  failure  probability  is  8:1.  Even  if  the  sensor  has  only  one-tenth  the  failure 
of  the  computer  (Q^  - *1Q2)»  the  ratio  of  Isolated  to  cross  strapped  failure  probability  is  1.46:1. 

Despite  this  apparent  advantage,  sectlonallzed  redundancy  through  cross  strapping  of  sensors  was 
generally  not  Implemented  in  the  fail-operative  redundant  systems  of  the  late  1960 *s  and  early  1970's. 

Mechanization  complexities  (not  visible  to  the  reliability  theoreticians  who  calculated  the  advan- 
tages using  the  above  type  of  analyses)  discouraged  this  practice.  The  mechanization  device  was  the 
signal  selector  or  voter.  In  addition  to  the  requirement  that  voting  circuits  provide  proper  electrical 
Isolation  to  prevent  a failure  at  the  voter  from  propagating  back  into  the  input  signal  sources,  faults 
of  input  signals  must  be  detected  and  reported  and  in  many  cases  (the  quad  voter,  for  example),  the  voter 
circuit  must  be  reconfigured  to  cope  with  the  surviving  inputs.  Perhaps  the  most  serious  contributor  to 
complexity  growth  was  the  need  for  special  self-test  circuitry.  Because  of  the  voter  circuit's  Inherent 
failure  suppressive  properties,  it  suffers  from  latent  failure  vulnerability.  Not  only  does  it  suppress  i 

failed  input  signals,  it  suppresses  many  of  its  own  failures.  When  such  latent  failures  exist,  the  clr-  ^ 

cult  acts  very  normally  until  it  is  actually  needed  to  perform  its  failure  suppressive  role.  One  of  the  i 

input  channels  goe$  hardover  and  the  mid-value  circuit,  because  of  its  undetected  latent  failure,  trans-  i 

mits  the  failed  signal  rather  than  the  good  ones.  Moreover,  the  failure  detection  logic  is  confused  and 

it  shuts  down  the  wrong  channel.  Obviously,  such  a situation  is  frowned  upon  even  though  the  probability  ] 

of  such  a combination  of  events  is  very  remote.  How  does  one  correct  for  this  problem?  We  must  add  a I 

means  of  exercising  the  internal  structure  of  the  mid-value  logic  circuit  so  that  any  latent  failures  can  ^ 

be  detected  and  reported.  This  added  complexity  often  involved  more  circuitry  than  we  had  in  the  device  : 

being  tested;  and  then,  the  test  circuitry  failure  peculiarities  must  be  considered  in  great  detail.  ^ 

A complexity  divergence  became  apparent  to  designers  of  analog  redundant  systems  in  the  latter 
1960's.  Figure  9 was  developed  by  analyzing  the  number  of  circuit  components  used  for  redundancy  manage- 
ment functions  in  typical  redundant  system  designs  of  the  1968  era.  The  figure  shows  that  the  circuitry 

needed  to  perform  the  control  and  logic  functions  associated  with  aircraft  stabilization  and  control  ; 

becomes  an  insignificant  part  of  the  total  electronics.  The  redundancy  management  electronics  which  ’ 

provided  the  circuitry  for  accuracy  enhancement,  fault  isolation,  fault  reporting,  and  built-in  test  be- 
came the  dominant  part  of  the  systems.  The  more  sophisticated  of  these  systems  began  to  Incorporate  \ 

small  digital  computers  to  perform  detailed  pre-flight  checkout  of  the  redundancy  management  electronics.  t 

This  checkout  was  essential  to  maintaining  mission  success  probability  because  the  redundancy  management  ^ 

system  was  effectively  a fault  correcting  mechanism  which  tended  to  mask  failures.  The  latent  failure,  ’ 

such  as  that  discussed  above  for  the  voter  failure,  had  to  be  uncovered  by  a form  of  preventive  main- 
tenance. The  vulnerability  of  a quad  redundant  system  to  such  types  of  failure  is  Illustrated  in  Figure 
10.  This  figure  is  reproduced  from  Reference  16  which  analyzed  the  latent  failure  effects  of  a quad  re- 
dundant system  with  each  channel  having  a 1600  hour  MTBF.  The  figure  shows  probability  of  a one  hour 
mission  success  when  the  in-flight  failure  monitoring  capability  is  only  90  percent  and  various  degrees 
of  pre-flight  Built-In  Test  (BIT)  thoroughness  are  employed.  It  is  seen  that  without  the  thorough  BIT 
performed  as  a pre-flight,  mission  reliability  deteriorates  excessively. 

The  final  strains  on  analog  system  complexity  were  reached  in  this  1967-1970  era.  In  the  experi- 
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mental  quad-redundant  (fail-operative)  systems,  separate  digital  computers  were  being  employed  to  perform 
this  BIT.  In  the  USAF  Survlvable  Flight  Control  System  in  the  F-4  (Reference  14,  18),  a separate  BIT 
digital  computer  and  associated  Maintenance  and  Test  Panel  were  used  for  this  function.  In  the  NASA 
Digital  Fly-By-Wire  system  in  the  F-8,  the  triplex,  back-up  electronics  unit  incorporated  a special  pur- 
pose digital  computer  within  its  control  and  display  panel  to  perform  the  pre-flight  BIT  that  tested  the 
triplex  electronics  as  well  as  the  primary  and  back-up  redundant  hydraulic  actuators.  In  the  U.S.  SST 
program,  the  quad  redundant  fly-by-wire  and  stabilization  electronics  (Reference  15)  were  going  to  be 
tested  by  the  general  purpose  digital  computers  which  formed  the  nucleus  of  that  aircraft's  digital 
autopilot . 

E.  Digital  Systems 

One  message  became  very  clear  to  the  redundant  system  designers  of  the  latter  1960's.  As  we  tried 
to  guarantee  a redundant  system's  safety  by  correcting  for  circuit  or  logical  defects  in  the  monitoring 
and  redundancy  management,  more  hardware  had  to  be  added.  Adding  hardware,  in  turn,  complicated  the  BIT 
and  its  interface  with  the  remainder  of  the  system.  This  complexity  spiral  could  be  eliminated  If  we 
built  Che  systems  around  a general  purpose  digital  computer.  That  computer  would  not  only  perform  the 
control  law  and  logic  functions  needed  to  stabilize  and  control  the  aircraft  but  it  could  perf  rm  the 
monitoring,  redundancy  management  and  BIT  without  adding  any  more  hardware  than  what  is  needed  for  the 
basic  control  function.  Once  the  interfaces  with  the  sensors  and  actuators  are  established,  system  test, 
fault  Isolation  and  reporting  can  be  accomplished  in  the  software.  Such  systems  have  been  demonstrated 
in  commercial  transport  applications  (Reference  19)  and  military  aircraft  applications  (Reference  20). 

These  demonstrations  have  verified  the  breakthrough  in  built-in  system  testing,  fault  isolation  and  fault 
reporting  which  can  be  obtained  in  a properly  designed  digital  system.  The  potential  advantages  that 
these  new  systems  can  provide  in  the  area  of  maintenance  management  cannot  be  underestimated.  One  of  the 
problems  that  plagues  the  maintenance  of  contemporary  analog  redundant  flight  control  systems  is  a mon- 
itor activated  disconnect  in  flight  which  cannot  be  traced  to  any  faulty  component  when  the  aircraft  re- 
turns. The  digital  system  can  record, into  a non-volatile  memory,  the  specific  cause  of  an  In-flight  dis- 
engagement so  that  ground  maintenance  actions  can  concentrate  on  the  Isolate-i  faulty  component . Since 
much  of  the  historically  poor  reliability  record  in  avionics  Is  attributed  to  improper  LRU  renx'val  and 
consequent  maintenance  actions  on  the  wrong  box,  the  properly  designed  digital  system  offers  promise  of 
alleviating  this  problem. 


RELATIVE  COMPLEXITY 
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Figure  9 

Effect  of  Redundancy  Configuration 
on  Eqalpnent  Cooplexity 


Figure  10 

Effect  of  BIT  Testing  on  Probability  of  Mission  Failure 


CONCLUSIONS 

In  tvo  decades  of  dramatic  technology  advances,  flight  control  systems  have  exploited  these  advances 
by  adding  new  functions  at  a rate  which  tended  to  outstrip  the  potential  for  reliability  improvement.  In 
the  past  twenty  years,  electronic  miniaturization  has  permitted  two  orders  of  magnitude  of  size  reduction 
for  a given  control  function,  but  in  practice  the  growth  in  requirements  has  resulted  in  flight  control 
electronics  which  are  about  one-fourth  the  size  of  the  1956  equipment. 

A review  of  how  overall  flight  control  system  reliability  has  fared  in  the  presence  of  these  tech- 
nology advances  Is  difficult  to  assess  because  maintenance  procedures  tend  to  have  more  influence  on  re- 
ported MTBF  than  much  of  the  design  technology  used  to  improve  reliability.  Complete  autopilot  systems 
that  were  designed  20  years  ago  and  are  still  in  use  today  often  show  field  MTBFs  of  under  50  hours.  An 
MTBF  of  100  hours  for  these  early  systems  (including  vertical  gyros  and  electromechanical  actuators)  was 
considered  good.  Commercial  transport  flight  control  systems  designed  in  the  latter  1960*s,  when  well 
maintained,  typically  give  MTBFs  of  about  400  hours  (less  vertical  references  and  actuators).  Th«se 
same  systems,  if  mechanized  with  1958  technology,  would  consume  more  than  10  times  the  volume  and  result 
in  an  MTBF  of  much  less  than  one-tenth  the  values  achieved  in  the  1970's. 

The  strongest  impetus  to  the  increasing  complexity  of  systems  originating  in  the  latter  1960's  was 
redundancy,  (motivated  primarily  by  safety),  and  Its  unique  requirements  for  built-in  test.  Such  systems 
pressed  the  best  of  contemporary  analog  circuit  technology  to  the  limit  and  set  the  stage  for  the  de- 
velopment of  the  digital  flight  control  systems  which  appeared  by  the  mld-1970's.  The  main  contribution 
of  the  digital  system  from  the  standpoint  of  reliability  will  be  its  ability  to  handle  monitoring,  redun- 
dancy management  and  built-in  test  without  adding  more  hardware.  This  offers  significant  promise  in  the 
area  of  maintenance  management  since  maintenance  procedures  have  been  shown  to  be  the  most  significant 
factor  in  field  MTBF  for  both  commercial  and  military  systems. 
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SUMMARY 

The  airworthiness  requirements  for  the  certification  of  automatic  landing  systems  in  civil 
aircraft  include  an  explicit  statement  of  the  safety  level  to  be  achieved.  For  compliance  with 
these  requirements  a safety  assessment  of  the  system  must  be  made,  and  accepted  by  the  airworthiness 
authority.  It  must  contain  a logical  analysis  which  identifies  all  critical  failure  conditions 
of  the  system  and  shows  that  the  probability  of  each  is  appropriate  to  the  degree  of  hazard 
associated  with  it.  It  should  also  examine  the  factors  which  influence  the  performance  of  the 
system  and  show  by  means  of  analysis,  simulation  and  flight  testing  that  the  safety  level  will  be 
acceptable.  The  analysis  will  establish  the  maintenance  checks  necessary  together  with  their 
frequency,  and  any  other  limitations  on  the  use  of  the  system. 

1.  INTRODUCTION 

In  the  United  Kingdom  the  Civil  Aviation  Authority  is  responsible  for  ensuring  that  all 
aeroplanes  registered  in  the  country  are  airworthy  by  virtue  of  both  their  design  and  their 
maintenance.  This  responsibility  is  discharged  by  ensuring  compliance  with  a published  code  of 
airworthiness  requirements.  Before  an  aeroplane  receives  a Certificate  of  Airworthiness, 
without  which  it  may  not  operate,  there  is  an  investigation  of  the  design  against  the  yardstick 
of  the  requirements.  In  practical  terms  the  CAA  prepares  the  requirements  and  provides 
interpretative  material  while  the  manufacturer  is  obliged  to  carry  out  whatever  testing  and 
analysis  is  necessary  to  demonstrate  compliance  to  the  satisfaction  of  the  CAA.  The  aim  of  this 
paper  is  to  describe  those  particular  requirements  which  refer  to  automatic  landing  systems, 
their  background  and  application.  To  date  the  following  aeroplane  types  have  beer  approved 
by  CAA  for  automatic  landing  against  these  requirements:  BAC  1-11  and  SVC-10,  Hawker  Siddeley 

Trident,  Boeing  747,  Lockheed  L-1011,  Concorde. 

2.  ACCIDENT  RATES  AND  ACCEPTABLE  RISK 

In  regulating  airworthiness  it  is  important  to  realise  firstly  that  "complete  safety"  can 
never  be  achieved,  and  secondly  that  safety  is  a commodity  with  a price.  Thus  the  level  of 
safety  will  be  very  closely  related  to  the  money  spent  on  it,  and  in  turn  that  money  must  come 
out  of  the  cost  of  the  ticket  to  the  passenger.  In  seeking  to  determine  whether  ."'.viation  safety 
levels  are  acceptable  or  not,  it  is  fruitless  to  make  ccxnparison  with  other  modes  of  transport. 
Aviation  can  be  shown  to  be  "better"  or  "worse"  than  motor  cars  or  trains  depending  on  the  chosen 
index  (i.e,  number  of  deaths  per  passenger-mile,  or  passenger  journey  etc.).  The  only  useful 
indication  of  acceptability  is  the  fact  that  the  public  flocks  in  its  millions  each  year  to  fly 
on  the  world’s  airlines,  in  spite  of  the  publicity  always  accorded  to  aeroplane  accidents. 

This  suggests  that  aviation  safety  is  at  a level  which  is  generally  accepted  as  reasonable  in 
relation  to  the  cost  of  the  ticket. 

However,  the  average  layman  is  not  able  to  assess  the  risk  or  the  factors  that  influence  it, 
so  that  it  is  a responsibility  of  the  authorities  to  ensure  that  safety  is  maintained  overall  and 
that  it  is  reasonably  uniform.  It  is  also  generally  agreed  that  there  should  be  a steadily 
increasing  level  of  safety,  and  this  for  three  main  reasons: 

(a)  to  preserve  or  improve  the  position  of  aviation  in  relation  to  other  means  of  transport 

(b)  so  that  there  is  no  increase  in  the  number  of  accidents  occurring  in  any  one  year,  even 
though  there  is  a continual  growth  in  the  volume  of  aviation 

(c)  so  that  the  safety  of  a new  aeroplane  type  matches  the  safety  of  that  which  it  is  replacing 
(typically  the  safety  of  a type  improves  as  it  matures  and  has  its  major  problems  resolved). 

This  is  reflected  in  the  fact  th.it  for  world-wide  scheduled  air  transportation  (excluding  China 
and  the  U.S.S.R.)  the  number  of  fatal  accidents  per  million  hours  flown  has  decreased  progressively 
from  a value  of  about  4.0  in  the  early  1950 *s  to  2.^  for  1974,  The  latter  figure  corresponds  to  a 
value  of  2.9  fatal  accidents  per  million  flights.  It  is  worth  noting  that  these  two  indices  are 
the  most  corranonly  used  in  aviation  since  they  have  the  merit  that  they  relate  easily  to 
operational  records  and  equipment  failure  rates,  and  are  also  a measure  of  the  risk  to  the 
individual  passenger  that  he  will  be  in  a fatal  accident. 
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AIRWORTHINESS  REQUIREMENTS 


Most  airworthiness  requirements  are  rather  dogmatic  statements  specifying  a minimum  or 
maximum  value  for  a particular  parameter  or  characteristic  of  the  aeroplane.  Although  they 
appear  to  be  entirely  arbitrary  with  no  mention  of  probabilities  or  statistical  distributions, 
many  have  as  their  basis  a rational  calculation  to  give  a particular  level  of  safety.  For 
example,  the  aeroplane  is  required  to  survive  encounters  with  gusts  of  certain  magnitudes,  which 
are  specified  from  consideration  of  their  probability.  Again,  the  climb  capability  of  an  aero- 
plane with  an  engine  failure  in  various  phases  of  flight  is  based  on  the  probability  of  such  a 
failure,  and  the  statistical  variability  of  climb  gradient  for  other  reasons.  However,  requirements 
can  only  be  derived  in  this  way  when  there  is  sufficient  background  and  experience  to  establish 
the  statistics  with  confidence.  When,  in  the  early  1960 *s,  airframe  and  equipment  manufacturers 
asked  us  to  state  the  airworthiness  requirements  which  we  would  apply  to  an  automatic  landing  system, 
and  particularly  to  its  use  in  near  "blind”  conditions,  we  had  little  relevant  experience  from  which 
we  could  derive  requirements.  Furthermore  there  was  a danger  that  if  we  attempted  to  write 
detailed  requirements  of  the  conventional  kind  we  might  inadvertently  place  unnecessary  consti’aints 
on  the  design  of  systems,  so  inhibiting  inventive  design.  In  consultation  with  industry  we 
decided  that  we  should  state  the  requirements  in  terms  of  safety  levels  to  be  demonstrated  by  a 
suitable  safety  assessment,  backed  by  the  necessary  analysis  and  testing.  The  choice  of  safety 
levels,  and  the  form  of  the  safety  assessment  will  be  discussed  in  detail  in  the  paragraphs  that 
follow.  Essentially  the  consequences  of  system  failures  must  be  considered,  and  also  the  effect 
of  variations  in  system  performance  when  there  is  no  failure  present. 

Although  the  declared  objective  of  the  assessment  is  to  show  that  the  specified  safety  level 
is  achieved,  in  practice  it  is  the  critical  and  logical  scrutiny  of  the  system  which  is  of  most 
value,  and  not  the  precision  of  the  numerical  conclusions. 

Safety  assessment  techniques  are  being  more  and  more  widely  applied  in  the  certification  of 
aeroplanes,  and  indeed  in  some  cases  to  problems  which  are  not  as  amenable  to  numerical  analysis  as 
is  an  automatic  landing  system. 

4.  SAFETY  LEVELS  REQUIRED  FOR  AUTOMATIC  LANDING  SYSTEMS 

The  main  purpose  of  introducing  automatic  landing  systems  was  to  permit  operations  in  much 
lower  visibilities  than  had  been  possible  hitherto,  with  the  ultimate  objective  of  landing  virtually 
"blind".  The  principle  was  established  that  the  average  risk  in  these  new  operations  should  te 
the  same  as  the  overall  safety  level  for  all  landings  in  all  weathers.  In  fact  if  this  were 
achieved,  the  net  result  would  be  a modest  improvement  in  safety  because  some  of  the  more  marginal 
of  the  existing  operations  would  be  made  safer. 

In  addition  it  was  considered  necessary  to  limit  the  risk  which  could  be  taken  on  a specific 
flight  so  that  an  unduly  hazardous  landing  would  not  be  undertaken  even  though  average  risk 
considerations  would  permit  it.  The  maximum  risk  associated  with  using  the  system  was  limited  to 
approximately  the  total  average  risk  for  a complete  flight.  This  was  arbitrary  but  recognised 
that  a decision  to  divert  is  likely  to  incur  the  risk  arising  from  the  alternate  back  to  the 
original  destination. 

In  short,  the  airworthiness  requirements  are  stated  in  two  forms  - average  risk,  and  the 
specific  risk  at  the  last  point  in  the  intended  flight  from  which  a safer  diversion  can  be  made. 

The  following  is  an  extract  from  Reference  1 

"Average  Risk 

The  system  shall  be  such  that  the  total  fatal  landing  accident  rate  (i.e.  average  risk) 
due  to  the  use  of  the  system  at  any  time  and  in  the  new  visibility  conditions  permitted  below 
current  minima  (approx.  200ft,  and  J mile)  shall  not  be  greater  than  the  present  total  fatal 
landing  accident  rate  for  all  transport  aircraft.  This  figure  is  believed  to  be  of  the 
order  of  one  fatal  accident  per  million  landings.  Since  piloting  is  only  one  of  several 
possible  causes  of  fatal  landing  accidents,  the  system  should  not  contribute  a rate  greater 
than  1.0  X 10""^  fatal  accident  per  landing. 

Specific  Risk  - Risk  on  a Particular  Flight 

Take-off 

No  flight  shall  be  started  with  the  intention  of  making  a landing  using  the  system  if 
conditions  are  expected  to  exist  such  that  the  risk  of  a fatal  accident  due  to  the  use 
of  the  system  is  greater  than  3 x 10*6 , 

In  the  foreseeable  future,  the  probability  of  en-route  failure  in  the  system,  and  the 
probability  of  deterioration  in  weather  to  a state  where  the  system  cannot  be  used,  is 
likely  to  be  such  that  it  will  not  be  possible  to  meet  this  take-off  criterion  without 
providing  for  an  en-route  option  to  divert  to  an  alternative  destination.  In  this  case, 
compliance  with  the  3 x 10"°  take-off  risk  will  consist  in  ensuring  that  a suitable 
alternate  is  available  and  that  adequate  provision  is  made  for  the  pilot  to  receive 
and  use  such  information  as  he  may  require  to  make  the  decision  whether  to  divert  or 
not.  In  specifying  the  alternate .account  should  be  taken  of  the  probable  weather 
conditions  there,  and  the  probable  state  of  system  serviceability.  Adequate  fuel 
reserves  should  be  carried. 
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Risk  after  Take-off 

Where  the  use  of  the  system  depends  on  the  provision  of  an  alternate  destination 
the  decision  to  divert  should  be  taken,  if  during  the  flight  circumstances  arise  such 
that  the  risk  of  a fatal  accident  due  to  the  use  of  the  system  exceeds  3 x 10"^  and 

provided  that  the  diversion  is  likely  to  be  a safer  course  of  action. 

The  crew  must  not  be  expected  to  calculate  the  risk  but  must  be  given  guidance 

in  the  form  of  limitations  on  such  factors  as  aeroplane  or  equipment  unservice- 
abilities (including  the  elements  of  the  system),  weather  at  the  destination,  and  the 
state  of  the  runway.  When  limitations  are  being  derived,  each  one  shall  be  set  such 
that,  with  that  parameter  on  its  limit,  the  average  flight  landing  accident  risk 
would  be  less  than  3 x 10”^.  It  would  be  permissible  to  continue  to  use  the  system 
even  when  several  parameters  were  known  to  be  near  their  limits,  provided  they  have 
an  unrelated  effect.  So  far  as  is  practicable,  however,  guidance  should  be  given  to 
the  crew  so  that  they  would  avoid  using  the  system  if  several  parameters  were  near 
their  limits,  and  all  affected  the  system  in  a similar  way,  e.g.  if  all  tended  to 
increase  the  variability  in  touchdown  position." 

5.  AUTOMATIC  LANDTAG  SYSTEMS 

An  automatic  landing  system  contains  a number  of  sub-systems  and  equipments  distributed 
throughout  the  aeroplane  and  many  of  these  are  used  for  purposes  other  than  automatic  landing. 

The  system  design  varies  enormously  from  one  aeroplane  type  to  another,  but  essentially  it  consists 
of 

(a)  sensors,  normally  at  least  the  following 

ILS  glide  slope  and  localiser  receiver,* 

radio  altimeter, 

pitch,  roll  attitude  and  rate, 

air  data  computer  (airspeed  and  vertical  speed), 
lateral  and  longitudinal  accelerometers  ^ compass, 

(b)  computation  and  control  actuators  to  enable  the  aeroplane  to  "couple"  to  the  ILS 
localiser  and  glide  slope  during  the  approach,  to  control  the  airspeed,  to  remove  the 
drift  angle  due  to  cross-wind  before  touchdown,  to  reduce  the  descent  rate  before 
touchdown  ("flare")  and  to  control  the  landing  run  to  the  runway  centreline. 

(c)  switches,  instruments,  warning  lights,  indicators  to  enable  the  pilot  to  control  and 
monitor  the  operation  of  the  system. 

Equipment  may  be  duplicated  where  there  is  a need  to  detect  a failure  and  shut  the  system 
down  without  causing  an  unwanted  manoeuvre  to  the  aeroplane  (fail- passive).  further  redundancy 
is  incorporated  when  there  is  a need  to  ensure  continued  operation  after  a failure  (fail" 
operational), 

6.  SAFETY  ASSESSMENT 

Because  the  requirement  is  simply  a statement  of  the  risk  level  to  be  demonstrated,  there  is 
an  automatic  obligation  on  the  manufacturer  to  produce  a safety  assessment.  Expei ience  has  shown 
that  an  analysis  which  simply  sets  out  to  examine  the  effect  of  all  component  failures,  tolerances, 
environmental  conditions  with  the  infinity  of  possible  combinations  will  rapidly  become  unwieldy 
and  ultimately  unworkable.  It  is  much  more  practicable  to  start  at  the  output  end  of  the  system 
and  to  consider  what  hazardous  effects  the  system  can  generate.  It  is  then  possible  to  allocate 
an  appropriate  proportion  of  the  permitted  10"*^  risk  to  each  effect  ond  to  work  back  through  the 
system  finding  all  the  ways  by  which  such  an  effect  can  come  about.  This  list  of  hazardous 
effects,  and  the  permitted  probability  for  each,  may  be  taken  as  the  "airworthiness  objectives" 
for  the  system. 


* The  ILS  ground  station  is  approved  separately  as  operating  to  specified  failure  and 
performance  characteristics.  It  comprises :- 

(i)  an  ILS  localiser  transmitter  providing  azimuth  guidance  to  and  along 
the  runway. 


(ii) 


an  ILS  glide  slope  transmitter  providing  a 3 descent  path  to  the 
touchdown  zone  of  the  runway. 
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6.1,  Airworthiness  Objectives 

Airworthiness  objectives  may  be  derived  by  considering  the  "freedoms”  the  system  enjoys. 

In  simple  terms  an  automatic  landing  system  which  is  functioning  incorrectly  may  cause  the 
aeroplane  to  have: 

(i)  Incorrect  position,  e.g.  such  that  it  strikes  an  obstacle  on  the  approach,  or 
lands  off  the  runway 

or 

(ii)  Incorrect  rate  of  change  of  position,  e.g.  such  that  the  rate  of  descent  at  landing, 
or  the  lateral  cross  track  velocity,  causes  structural  damage 

or 

(iii)  Incorrect  attitude  at  landing,  e.g.  high  bank  angle  which  causes  the  wing  tip  to 
tbucn  or  a high  pitch  angle  causing  the  tail  to  scrape. 

Not  all  the  hazardous  effects  can  by  any  means  be  regarded  as  certain  fatal  accidents  or 
catastrophes.  Even  when  there  is  a high  probability  of  fatality,  there  may  be  a sufficient 
chance  of  a safe  outcome,  that  credit  can  be  taken  for  it  in  the  analysis.  This  wopld  allow 
the  probability  of  the  occurrence  to  be  greater  than  if  it  were  regarded  as  being  certain  to  cause 
a catastrophe.  CAA  has  carried  out  a study  of  past  landing  accidents  and  derived  approximate 
values  for  the  probabilities  that  given  types  of  accident  will  be  fatal.  These  values  have  been 
used  in  Safety  Assessments  prepared  to  date  and  are  listed  at  Appendix  1. 

Another  mitigating  factor  for  which  credit  may  often  be  taken  in  an  analysis  is  pilot  action. 
The  pilot  may  take  over  control  to  ccxnplete  the  landing  manually  or  to  abandon  it.  Any  assumptions 
as  to  his  ability  to  intervene  effectively  must  take  full  account  of  the  way  he  is  alerted  to  the 
need  to  do  so,  and  the  situation  at  the  time  (flight  path,  trim,  etc.).  For  example  if  the 
automatic  landing  system  were  to  cause  the  aeroplane  to  land  at  or  near  the  edge  of  the  runway,  it 
is  accepted  that  the  pilot  can  recognise  this  easily  and  take  over  when  the  system  is  being  used  in 
good  visibility,  but  that  in  limiting  visibility  conditions  he  cannot.  Therefore  a system  which  is 
to  be  used  in  limiting  conditions  must  be  shown  to  be  proof  against  this  type  of  occxirrence  (or 
to  provide  an  early  and  effective  warning  in  the  flight  deck). 

Probably  the  most  difficult  objective  to  frame  is  that  relating  to  autopilot  cut-out  or 
loss  of  function  particularly  at  low  height  in  the  very  lowest  visibilities.  If  this  occurs 
immediately  prior  to  touchdown  with  the  flare  virtually  completed,  the  pilot  might  elect  to  continue 
to  a landing  under  manual  control^  but  if  it  occurs  earlier  in  the  approach  or  flare  he  would  have 
to  abort  the  landing  and  make  a go-around.  In  either  case  it  is  difficult  to  estimate  the  risk 

involved. 

6.2.  Analysis  Tasks  and  Division  of  Risk 

Any  particular  hazardous  effect  may  arise  as  a result  of: 

(a)  a single  failure  or  combination  of  failures: 

the  safety  assessment  should  therefore  contain  a failure  analysis. 

(b)  the  performance  of  a failure-free  system  due  to  tolerances  and 
environmental  conditions: 

the  safety  assessment  should  therefore  contain  a performance  analysis. 

(c)  a combination  of  failures  and  performance: 

the  safety  assessment  may  need  to  examine  the  performance  of  the  system  in 
some  failure  modes. 

It  may  be  helpful  at  this  stage  to  take  an  example  illustrating  the  framework  into  which  the 
performance  and  failure  analyses  must  fit.  If  we  consider  the  case  of  a landing  in  which  the 
vertical  velocity  at  touchdown  exceeds  the  design  ultimate  vertical  velocity  for  the  aeroplane,  and 
if  we  suppose  that  one  thirtieth  of  the  total  lO"*^  risk  may  be  allocated  to  this  cause  then  it  must 
be  shown  that  the  probability  of  a catastrophe  due  to  this  cause  is  107'^  - 3 x lO”^* 

We  may  now  further  suppose  that  a catastrophe  is  equally  likely  to  arise  from: 

(a)  performance  without  any  failure, 

(b)  failures  without  any  performance  variability, 

(c)  failures  and  performance,  e.g.  combining  relatively  probable  failures  with  adverse 
environmental  condit ions. 

, -9 

Each  of  these  categories  can  then  contribute  10  to  the  total  probability  of  a catastrophically 
hard  landing. 

Tn  this  particular  cas<»  pilot  int#»rv<*ntlon  does  not  nrovide  any  alleviation  since  CAA 
experience  is  that  pilots  do  not  reliably  detect  from  visual  cues  that  an  automatic  landing  flare  is 
about  to  result  in  a hard  landing.  In  fact,  if  the  system  fails  to  flare  the  aeroplane  the  pilot 
is  unlikely  to  intervene  early  enough  to  corrert  the  situation  unless  he  is  given  a positive 
warning  to  do  so. 
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On  the  other  hand  a landing  exceeding  the  ultimate  strength  of  the  undercarriage 
as  having  catastrophic  consequences  on  one  in  ten  occasions  (Appendix  1).  Therefore, 
worthiness  objectives  it  will  be  apparent  that  a landing  which  is  hard  enough  to  break 
aeroplane  (>12  fps)  may  have  probabilities  not  exceeding: 

_ 9 

10  per  landing  due  to  failures,  e.g*  failure  to  flare  without  warning, 

-fl 

10  per  landing  due  to  performance,  e.g.  the  effect  of  an  adverse  gust, 

_8 

10  per  landing  due  to  failures  combining  with  performance. 

For  systems  to  be  used  in  the  lowest  visibilities,  both  the  failure  analysis  and  the  performance 
analysis  must  also  assess  the  probability  of  loss  of  system  function  or  "cut-out"  (sometimes  referred 
to  as  its  "integrity").  CAA  has  accepted  safety  assessments  which  assume  safe  pilot  take-over  at  the 
10”^  to  10~^  level  in  these  conditions.  Therefore  to  meet  an  overall  10~^  criterion  the  system  cut 
out  rate  should  be  about  10’^  taken  over  the  last  half  minute  or  so  of  the  flight.  In  practice  this 
has  proved  to  be  extremely  difficult  to  achieve  and  to  demonstrate,  even  with  fully  fail  operational 
systems . 

The  fact  that  some  reliance  can  be  placed  on  the  pilot  does  ease  the  demands  on  the  system  and 
the  analysis  by  comparison  with  say  a fly-by-wire  system  where  the  safety  of  the  aeroplane  relies  on 
continuing  system  operation.  It  will  be  a most  challenging  task  to  show  that  there  is  no  sequence  of 
failures  or  even  a remote  common  mode  of  failure  which  will  defeat  the  redundancy  built  into  such  a system 
to  a probability  of  10'^^  taken  over  the  whole  flight. 

7.  ANALYSIS 


may  be.  regarded 
in  the  air- 
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7.1.  Failure  Analysis 


The  analysis  of  possible  failures  of  the  system  must  start  from  the  airworthiness  objectives  which 
establish  the  effects  that  are  critical  or  hazardous.  It  is  then  a reasonable  task  to  determine  the 
condition  of  the  system  required  to  produce  these  effects,  and  the  design  principles  which  should  be 
adopted  to  minimise  the  likelihood  of  their  occurrence.  In  particular  the  design  principles  should 
define  the  precautions  taken  to  limit  the  effects  of  failures  (e.g.  sub-system  segregation,  authority 
limiters)  and  the  extent  to  which  it  is  necessary  to  provide  for  continued  operat-ion  after  failures. 

The  analysis  should  take  the  form  of  a logical  examination  of  all  the  individual  failures,  and 
combinations  of  failures,  which  can  lead  to  the  condition  of  the  system  under  consideration.  It 
should  also  contain  a statistical  evaluation  of  the  probabilities  of  the  failures  to  show  compliance 
with  the  objectives. 

In  carrying  out  the  statistical  part  of  the  analysis  logical  and  careful  thought  must  be  given  to 
the  time  periods  which  are  significant  to  the  particular  probability  being  computed.  For  example  a 
failure  may  only  be  able  to  cause  a hazard  during  the  last  half  minute  or  so  of  the  flight.  However, 

that  failure  may  be  dormant  up  to  this  point  and  may  only  declare  itself  during  the  landing.  Thus 

in  evaluating  the  probability  that  the  failure  will  be  present  during  the  landing  full  account  must 
be  taken  of  the  whole  period  back  to  the  last  point  at  which  the  system  element'  was  checked  and  found 
to  be  healthy.  Consequently,  an  important  product  of  the  failure  analysis  is  the  maintenance  pro- 
cedure for  the  system  since  it  must  show  that  the  checks  are  sufficiently  comprehensive,  and  sufficient- 
ly frequent  for  system  safety.  To  illustrate  the  point  in  a very  simplified  way,  suppose  that  for  a 

particular  system  the  mean  time  between  failures  in  the  flare • computation  which  lead  to  a hard  land- 
ing is  about  10,000  hours.  If  this  part  of  the  system  is  checked  automatically  at  the  start  of  the 
approach  then  the  risk  period  over  which  it  can  fail  is  of  the  order  of  two  minutes,  i.e.  the  probability 
of  a landing  with  this  failure  would  be  approximately  2 =33  x 10“^  which  does  not  meet  the  10“® 

target,  60  x 10,000 

There  would  therefore  be  a need  for  a second  channel  to  "monitor"  the  first  so  that  in  the  event 
of  a failure,  the  disagreement  between  channels  would  cause  the  system  to  disengage  and,  provided  the 
pilot  is  given  an  unmistakable  warning,  he  can  take  over  and  make  a safe  landing.  The  probability 
of  both  channels  failing  following  the  approach  check  is  negligible  (approximately  10"ii),  but  of 
course  that  is  not  the  end  of  the  matter.  It  is  also  necessary  to  ensure  that  the  approach  check 
sequence  itself  can  be  relied  on.  This  will  almost  certainly  require  that  a ground  check  be  carried 
out  at  some  regular  interval. 

In  general  the  failure  analysis  is  concerned  with  satisfying  the  average  risk  criterion,  and  it  is 
only  occasionally  that  there  is  a need  to  consider  the  specific  risk  for  the  particular  flight.  That 
only  arises  when  there  is  a question  of  using  a system  with  a known  failure. 

The  following  gives  some  specific  guidance  on  the  detailed  conduct  of  the  failure  analysis  for  a 
safety  assessment, 

(a)  The  system  which  is  being  analysed  must  be  precisely  defined  (boundaries,  interfaces, 
modification  standard  etc,). 

(b)  l^ere  the  effect  of  a failure  is  not  readily  apparent,  either  the  most  adverse  possible 
consequence  should  be  assumed,  or  such  testing  should  be  carried  out  as  may  be  required 
to  f*stahlish  the  oefpct  without  dou^t. 


i 
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(c)  A single  failure  may  only  be  assessed  as  having  a probability  less  than  10  based 

on  applicable  service  experience  and  analysis  or  alternatively  a detailed  engineering 
evaluation  backed  by  testing. 

• 9 

(d)  A single  failure  may  only  be  assessed  as  having  a probability  less  than  10  when  it 
applies  to  a particular  mode  of  failure  (e.g.  mechanical  jamming)  and  it  can  be  shown 
that  such  a failure  need  not  be  considered  as  a practical  possibility. 

(e)  In  systems  which  rely  for  their  airworthiness  on  redundancy,  particular  attention 
should  be  paid  to  coninon  mode  failures  i,e.  multiple  failures  arising  from  a 
single  cause.  Such  a conmon  cause  might  be: 

local  fire 

electromagnetic  interference  or  electrical  transient 

mechanical  vibration 

leakage  of  water 

failures  of  cooling  systems. 

(f)  The  influence  of  other  systems  should  be  taken  into  account. 

(g)  When  the  failure  of  a device  can  remain  undetected  in  normal  operation,  the  frequency 
with  which  the  device  is  checked  will  directly  influence  the  probability  that  such  a 
failure  is  present  on  a particular  occasion. 

(h)  When  the  failure  can  be  expected  to  result  in  other  failures,  then  account  should  be 
taken  in  the  analysis  of  these  further  failures. 

(i)  The  features  which  are  shown  by  the  analysis  to  be  critical  should  be  reviewed  to 
determine  whether  modification  action  should  be  taken. 

7,2.  Performance  Analysis 

Many  different  factors  affect  the  performance  of  a system,  and  to  some  remote  probability  a number 
of  them  may  combine  together  in  an  adverse  sense  sufficiently  to  cause  an  accident.  For  an  automatic 
landing  system,  it  is  the  factors  external  to  the  system  which  have  a dominant  effect  on  the  performance 
rather  than  tolerances.  These  are  generally  of  minor  significance  but  may  not  be  negligible.  The 
factors  which  matter  will  depend  on  the  performance  parameter  being  considered.  Wind  shears  and  gusts 
will  obviously  influence  lateral  as  well  as  vertical  flight  path  control,  but  noise  or  bends  on  the 
ILS  localiser  beam  will  only  affect  azimuth  control.  The  flare-out  manoeuvre  and  the  touchdown  impact 
ar^  mainly  influenced  by  wind  shear  and  gusts,  and  by  the  profile  of  the  ground  immediately  prior  to, 
and  during  the  landing.  The  ground  profile  effect  arises  because  the  main  terms  in  the  control  law 
are  based  on  the  height  of  the  aeroplane  wheels  above  the  ground  as  measured  by  a rs4fcc«^ltimeter,  and 
the  flare  computation  is  therefore  sensitive  to  ramps,  steps  or  bumps  on  the  groUnd.  However,  this 
is  generally  not  a severe  problem  because  the  flare  manoeuvre  is  largely  carried  ^ut  when  the  aeroplane 
is  over  the  paved  runway  surface.  Hoise,  or  "bends",  on  the  ILS  glide  slope  beam  may  also  contribute, 
but  generally  only  to  a small  extent. 

The  method  which  has  been  adopted  for  performance  analysis  consists  in  establishing  the  statistical 
distribution  of  the  parameter  being  considered,  and  then  determining  from  that  the  probability  that  it 
will  reach  a hazardous  level.  For  example  considering  the  hard  landing  case  again,  the  procedure 
would  be  to  establish  the  distribution  of  vertical  velocity  at  touchdown,  and  then  to  determine  that  it 
will  not  reach  a value  exceeding  the  design  ultimate  value  (nonnally  12ft.  per  second  at  maximum  landing 
weight)  with  a probability  greater  than  the  desired  value  of  10"®. 

Some  of  the  factors  which  influence  performance  are  "deterministic"  in  nature.  For  example,  all 
other  things  being  equal,  the  touchdown  point  along  the  runway  may  vary  with  the  centre  of  gravity  of 
the  aeroplane.  But  the  c.g.  itself  will  have  a statistical  distribution  so  that  the  overall  outcome  is 
a statistical  distribution  of  touchdown  position  as  affected  by  centre  of  gravity.  Other  factors  such 
as  turbulence,  or  beam  noise,  are  themselves  essentially  statistical  and  will  also  generate  a dis- 
tribution of  touchdown  points,  etc.  All  these  individual  distributions  combine  statistically  to 
generate  the  overall  statistical  distribution  for  the  parameter  in  question.  (For  gaussian  dis- 
tributions with  the  same  mean,  the  variance  of  the  overall  distribution  is  the  sum  of  the  individual 
variances. ) 

The  problem  of  course  is  to  establish  the  statistical  distributions  of  the  critical  parameters. 
Clearly  flight  testing  is  necessary,  but  limited  by  the  time  and  money  available,  and  certainly  not  to 
anything  like  the  levels  necessary  to  "prove"  probabilities  in  the  order  of  10"®.  Extensive  use  is 
therefore  made  of  simulations  with  statistical  models  of  the  distributions  of  the  disturbing  parameters, 
simulation  of  the  aeroplane  and,  either  simulation  of  the  automatic  landing  system  or  real  hardware 
units.  Whereas  flight  test  landings  will  be  numbered  in  hundreds  at  most,  the  simulation  runs  may  run 
to  thousands  or  hundreds  of  thousands.  This  is  still  regrettably  small  by  comparison  with  10"®,  so 
that  the  statistical  distributions  which  have  been  measured  must  be  extrapolated  before  reaching  con- 
clusions (inferences  ) regarding  safety.  The  confidence  with  which  this  can  be  done  is  com- 
pletely dependent  on  the  quality  and  quantity  of  information  which  has  gone  into  the  modelling  of  the 
critical  variables,  and  the  accuracy  of  the  aeroplane  and  system  simulation.  A clear  understanding 
of  the  system  and  the  factors  which  are  most  critical  to  it  will  enable  the  designer  to  concentrate 
his  attention  on  the  effects  of  the  more  extreme  conditions,  i.e.  the  "tails"  of  the  distributions. 
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Although  statistics  gained  from  flight  testing  can  not  be  used  on  their  ovm  to  establish  the 
performance  of  the  system,  it  is  of  course  a prime  requirement  that  distributions  derived  from  flight 
testing  should  be  compatible  with  and  support  the  simulation  work.  Without  that,  there  can  be  no  con- 
fidence in  the  conclusions  of  the  analysis. 

An  important  product  of  the  performance  analysis  is  a statement  of  the  limitations  to  be  observed 
in  the  use  of  the  system,  e.g. 

the  quality  of  the  ILS  localiser  or  glide  slope 
wind  speed  (cross,  tail  or  total) 

any  limitation  on  aeroplane  weight  or  centre  of  gravity,  additional  to  those 
applied  to  manual  landing. 

These  are  determined  as  the  maximum  value  of  the  particular  parameter  which  will  permit  compliance 
with  the  average  risk  criterion,  or  when  appropriate,  with  the  specific  risk  for  the  particular  flight. 

7.3  Failures  and  Performance 


In  describing  the  analysis  of  failures  (para.  7.1)  consideration  was  only  given  to  those  which  are 
certain  to  be  catastrophic,  and  those  which  have  a high  probability  of  being  so  (e.g.  a failure  to  flare 
at  1 : 10.). 

However,  there  is  a class  of  failures  which  have  less  severe  consequences  and  may  therefore  be 
permitted  to  occur  with  a higher  probability.  For  example  a failure  which  leads  to  a partial  flare 
may  leave  the  system  able  to  make  quite  satisfactory  landings  except  in  the  presence  of  the  more 
extreme  gusts.  For  example  if  a failure  of  this  nature  had  a probability  of  10“^  per  landing,  then 
it  would  be  necessary  to  analyse  the  performance  of  the  system  in  its  failed  state  to  determine  that 
the  probability  of  a hard  landing  is  less  than  10“^. 

Great  care  is  needed  to  ensure  that  this  does  not  add  enormously  to  the  amount  of  performance 
analysis  to  be  carried  out.  In  general  pessimistic  simplifying  assumptions  should  be  used  to  clear 
all  but  the  most  marginal  of  the  cases  which  have  to  be  considered. 

7.4.  In-Service  Proving 


It  has  become  the  practice  in  the  United  Kingdom  that  automatic  landing  systems  are  subjected 
to  a comprehensive  scrutiny  in  day-to-day  operations  before  they  are  released  for  use  in  the  lowest 
visibilities.  Formal  certification  for  low  visibility  operation  of  the  system  is  withheld  until  a 
period  of  in-service  proving  has  been  completed.  Analysis  of  pilot  reports  and  on-board  flight 
data  recordings  is  used  to  monitor  the  performance,  integrity  and  failures  of  the  system  in  line 
service.  There  are  two  main  channels  of  activity  - "routine”  and  "special  events". 

Analysis  of  a large  batch  of  landings  is  used  to  confirm  that  the  performance  of  the  system  on  a 
day-to-day  basis,  and  that  the  reliability  of  the  equipments  is  compatible  with  the  conclusions  of  the 
safety  assessment. 

Any  landings  which  appear  to  fall  outside  the  expected  range  of  performance  variations,  and  any 
failure  effects  which  appear  to  violate  the  safety  assessment  are  treated  as  "special  events"  and  are 
subjected  to  specific  and  detailed  examination.  In  many  cases  the  incident  is  resolved  satisfactorily, 
but  in  a small  number  of  cases  the  result  of  the  investigation  is  a hardware  modification,  a changed 
procedure  or  some  amendment  to  the  limitations  on  the  use  of  the  system,  and  none  of  the  systems  proved 
in-service  up  to  date  has  escaped  such  action.  There  is  no  doubt  that  this  period  of  in-service 
proving  provides  an  essential  back-up  to  the  safety  assessment  procedure  for  automatic  landing  systems 
which  are  to  be  used  in  "blind"  conditions. 

8.  OTHER  SYSTEMS 


General  airworthiness  requirements  for  systems  also  contain  explicit  statements  of  acceptable 
probabilities  for  failures,  expressed  in  words  rather  than  numbers  and  categorised  according  to  the 
severity  of  their  effect  (Extremely  Improbable,  Extremely  Remote,  Remote,  Probable,  etc,).  Acceptable 
numerical  interpretations  have  been  specified  and  have  been  used  as  the  basis  for  many  safety  assess- 
ments over  the  past  five  years  or  so. 


The  CAA  is  consulting  with  industry  on  draft  proposals  which  would  clarify  and  amplify  these 
requirements.  In  essence  these  proposals  provide  for:- 


(a)  design  review  of  a new  aeroplane  type  following  which  there  would  be  agreement  between 
CAA  and  the  manufacturer  on  the  systems  which  require  a safety  assessment  procedure,  and 
those  which  can  be  accepted  against  conventional  requirements, 

(b)  airworthiness  objectives  to  be  agreed  for  each  afety  «sessment  as  a list  of  effects 
associated  with  the  system  under  consideration,  and  the  allowable  probability. 


(c)  analyses  of  failures  and  their  effects,  including  combinations  of  failures, 
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(d)  statistical  analysis  only  to  be  carried  out  where  the  conclusions  of  the  failure 
analysis  do  not  show  obvious  compliance  with  the  airworthiness  objectives,  and 
especially  in  the  case  of  complex  systems,  new  technology,  etc., 

(e)  performance  analysis  of  those  systems  where  performance  variability  may  have 
critical  consequences, 

(f)  safety  assessment  documentation  adequate  to  ensure  that  throughout  the  life  of 
the  aeroplane  modifications  to  the  system  can  be  designed  and  implemented  in  such 
a way  that  the  airworthiness  objectives  continue  to  be  complied  with. 

9.  MANAGEMENT 

Finally,  it  is  worth  dwelling  briefly  on  the  management  of  a safety  assessment  programme.  Its 
most  important  contribution  to  safety  is  that  the  system  concerned  is  subjected  to  a systematic  and 
critical  analysis.  It  is  necessary  that  the  people  who  do  this  task  understand  the  system  and  its 
functions,  and  the  airworthiness  objectives.  They  should  be  engineers  with  a knowledge  of  the 
system,  not  statistical  mathematicians  or  computer  programmers.  It  is  the  analysis  of  the  system 
which  is  important  in  the  safety  assessment,  not  the  arithmetic.  Ideally  the  safety  assessment  team 
should  be  a group  of  people  separate  from  the  design  force  and  particularly  so  in  terms  of 
management.  As  far  as  possible  they  should  not  be  responsible  to  the  managers  of  the  design  team. 

Of  course,  the  design  team  should  also  be  involved  since  they  must  know  and  work  to  the 
disciplines  and  constraints  imposed  by  the  safety  assessment.  System  "strategy"  or  "architecture’’ 
can  have  a great  influence  on  the  size  amd  complexity  of  the  safety  assessment  task , so  that  where 
possible  the  designer  should  always  consider  the  analysis  and  certification  programme  in  making  design 
decisions. 

Finally  there  can  be  no  doubt  that  the  top  management  of  the  design  organization  must  understand 
and  support  the  safety  assessment  programme,  and  must  be  prepared  to  implement  any  actions  it  calls  for. 

10.  CONCLUSION 

Automatic  landing  systems  and  their  use  in  low  visibility  must  be  subject  to  a systematic  safety 
assessment  programme  to  ensure  an  adequate  design  standard  to  determine  the  limitations  within  which 
the  system  may  safely  be  operated,  together  with  the  necessary  maintenc  »ce  and  operating  procedures. 

This  technique  can  be  applied  to  other  complex  or  novel  systems. 
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APPEKDIX  1 

THE  RISK  OF  FATALITY  ASSOCIATED  WITH  A LANDING  INCIDENT 

It  is  recognised  that  many  incidents  may  not  necessarily  result  in  fatal  accidents.  Therefore, 
in  assessing  the  overall  safety  of  the  operation  the  probability  of  an  incident  may  be  weighted 
according  to  its  fatality  risk.  The  following  is  a list  of  incidents  and  values  for  the  fatality 
risk  in  each  case. 

These  values  are  considered  to  be  representative  of  past  experience,  and  are  normally  accepted 
if  used  in  the  estimation  of  risks  in  a safety  assessment.  However,  if  it  were  clear  that  some  new 
aeroplane  was  substantially  less  crashworthy  in  any  or  all  of  the  incidents  listed,  the  fatality  risks 
would  have  to  be  revised  for  that  aeroplane  and  those  incidents. 


Incident 


Fatality  Risks 


(a)  Aeroplane  leaves  the  airspace  which  is  guaranteed 
to  be  free  of  obstacles  in  either  approach  or  go- 
around,  or,  if  the  excursion  is  small  and  of 
short  duration. 


fatal 
1 : 30 


(b)  Aeroplane  touches  down  short  of  the  runway,  but 

not  more  than  200’  short.  1 : 30 

(c)  Aeroplane  touches  down  more  than  200*  short  of  runway.  fatal 

(d)  Aeroplane  touches  down  to  side  of  runway  but  with  wheels 

not  more  than  250’  from  centre  line.  1 : 30 


(e)  Aeroplane  touches  dovm  to  side  of  runway,  but  wheels 
more  than  250’  from  centre  line. 

(f)  Aeroplane  runs  off  side  of  runway  but  wheels  remain 
within  250’  of  centre  line. 

(g)  Aeroplane  runs  off  side  of  runway  with  wheels  more  than 
250'  from  centre  line. 

(h)  Aeroplane  runs  off  end  of  runway,  not  more  than  200’ 
from  runway. 

(1)  Aeroplane  runs  off  end  of  runway  to  more  than  200*  from 
runway . 

(j)  Aeroplane  touches  down  harder  than  design  ultimate  vertical 
velocity. 

(3^)  Aeroplane  touches  down  with  sufficient  lateral  tracking 
velocity  or  yaw  to  collapse  undercarriage. 

(l)  Nosewheel  or  rear  fuselage  touch  ground  before  main  wheels. 

(m)  Pod  or  propeller  touch  ground. 

(n)  Wing  tip  touches  ground  after  undercarriage. 

(o)  Wing  tip  strikes  ground  before  undercarriage, 

(p)  Aeroplane  stalls. 

(q)  After  initiating  go-around  aeroplane  strikes  the  runway. 


fatal 

1 : 30 

fatal 

1 : 100 

fatal 

1 : 10 

1 : 10 

non-fatal 

non-fatal 

non-fatal 

fatal 

fatal 

likely  to  vary  from 
aircraft  to  aircraft 


The  incidents  defined  above  assumed  that  the  airfield  meets  the  recommendations  of  ICAO 
Annex  14  with  regard  to  the  Strip  surrounding  the  runway.  This  extends  200*  before  the  threshold 
and  to  500’  from  the  centre  line  on  either  side.  Of  this  only  the  first  250’  either  side  has  a 
prepared  surface,  and  the  outer  portions  of  the  Strip  are  merely  cleared  of  obstructions.  Where 
these  recommendations  are  not  met,  the  incidents  would  need  to  be  re-defined,  and,  for  example, 
if  waiting  aeroplanes  or  ground  vehicles  wer^  permitted  on  the  outer  part  of  the  Strip,  the 
incidents  reflecting  lateral  displacement  would  need  to  be  in  terms  of  wing-tip  and  not  wheel  displace- 
ment. 
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FUTURE  TRENDS  IN  HIGHLY  RELIABLE  SYSTEMS 


James  I.  Arnold 
THE  BOEING  COMPANY 
3801  South  Oliver 
Wichita,  Kansas  67210 


SUMMARY 

The  need  for  highly  reliable  flight  control  systems  in  both  control  configured  vehicles  (CCV)  and  conventionally  designed 
aircraft  is  discussed.  Technology  trends  in  the  area  of  control  system  computation,  electronics,  sensors  and  actuation  are 
addressed.  Increased  use  of  digital  computation  and  signal  multiplexing  in  future  control  systems  is  considered  inevitable 
Recent  technology  developments  in  high  density  electronic  packaging,  large  scale  integration  and  fiber  optics  will  be  applied 
to  achieve  highly  reliable  electronic  systems.  Component  designs  will  be  required  to  withstand  potentially  severe 
environments  in  the  presence  of  lightning  or  nuclear  phenomena.  Redundancy  management  will  continue  to  be  a prime 
driving  force  in  reliable  system  designs.  The  use  of  in-line  monitoring  to  limit  the  proliferation  of  redundant  channels  should 
find  application  in  future  systems.  Maintenance  and  preflight  self-test  systems  will  play  an  increasingly  vital  role  in  assuring 
the  integrity  of  redundant  flight-critical  systems. 


1.0  INTRODUCTION 

Current  trends  in  aircraft  research  and  development  indicate  that  future  aircraft  will  rely  more  heavily  on  flight  control 
systems  to  provide  improved  performance  and  more  economical  flight  operations.  Automatic  control  systems  will  assume 
new  pilot  assist  roles,  and  will  be  employed  in  the  emerging  active  control  technology  concepts  These  systems  will  be  critical 
for  flight  safety  in  some  or  all  of  the  aircraft  flight  envelope  Implementation  needs  that  result  from  these  trends  are  increased 
system  reliability,  lower  cost,  and  size  and  weight  improvements 

The  required  reliability  will  be  attained  through  development  of  higher  reliable  components  and  better  redundancy 
management  techniques  and  using  built-in  test  methods  to  monitor  system  status.  The  future  flight  control  systems  will  rely 
more  on  digital  techniques  and  less  on  analog  implementation  methods  to  provide  the  required  reliability.  These  systems  will 
employ  fly-by-wire  implementation,  because  of  the  increased  versatility  attainable,  and  will  incorporate  fiber  optic  techniques 
because  of  the  increased  mission  reliability  and  redundancy  that  is  possible 

Significant  advances  have  been  made,  and  will  continue  to  be  made,  in  computer  parts  technology.  Despite  dramatic 
improvements  in  digital  technology,  further  reductions  in  cost  and  size  will  be  rather  modest  because  the  digital  portion  of  a 
digital  flight  control  system  represents  approximately  35  percent  of  the  required  functions.  Analog  electronics,  power 
supplies,  wiring,  etc.,  form  the  balance  Thus,  further  improvements  are  dependent  upon  analog  technology  advances  which 
cannot  be  predicted  as  well  as  advances  in  digital  technology 

Improvements  in  sensor  technology  have  also  been  lacking  during  the  last  twenty  years,  making  it  difficult  to  predict  future 
technology  advances.  Rate  and  acceleration  sensors  used  in  aircraft  of  the  I950's  are  essentially  the  same  as  units  being 
installed  in  the  F-1 6 fighter.  Current  research  in  sensor  technology  is  directed  toward  improved  reliability  The  concepts  being 
considered  for  rate  sensors  eliminate  the  rate  gyro  spin  motor  and  inherent  bearing  failures  Redundant  skewed  sensor 
packages  with  redundant  computers  to  process  the  data  for  all  aircraft  bubsystems  are  also  being  investigated.  Some  of 
these  concepts  are  certain  to  be  employed  in  highly  reliable  automatic  flight  control  systems  of  the  future 

Highly  reliable  flight  control  systems  on  future  aircraft  will  rse  redundant  actuators  driven  by  high  pressure  hydraulics  to 
reduce  component  weight  and  space  The  use  of  fly-by-wire  will  permit  elimination  of  primary  control  linkages,  rods  and 
cables  Redundancy  configurations  will  be  tailored  to  the  individual  aircraft  requirements  Those  being  considered  include 
tandem  cylinders,  multiple  cylinders  arranged  side-by  side  along  the  surface  hinge  line,  and  combinations  of  parallel, 
independent  surfaces  operated  by  individual,  or  multicylinder,  actuators 

The  following  sections  discuss  future  applications  of  highly  reliable  control  systems  and  technology  trends  in  control 
system  computation,  electronics,  sensors,  and  actuation  systems  to  meet  these  application  needs 

2.0  BACKGROUND 

In  the  future,  increased  dependence  will  be  placed  on  flight  control  systems  for  flight  phase  critical  functions,  such  as 
automatic  landing  and  guidance,  and  aircraft  configuration  dependent  functions,  such  as  active  control  systems  and 
fly-by-wire  (FBW).  Systems  falling  into  the  first  category  are  those  used  on  aircraft  that  are  basically  stable  throughout  the 
flight  envelope  without  the  system  operating,  but  the  short  term  effect  of  system  failure  in  certain  flight  phases  can  result  in 
loss  of  the  aircraft.  Those  falling  into  the  second  category  are  systems  used  on  aircraft  which  rely  on  the  systems  to  meet 
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basic  flutter,  stability,  or  load  design  requirements  during  some  or  all  of  the  flight  envelope,  and  system  failure  would  lead  to 
loss  of  the  aircraft  if  if  occurs  in  a critical  flight  condition. 


Safety  of  flight  requirements  demand  improvements  in  system  reliability.  This  can  be  achieved  through  a combination  of 
improved  comp>onent  reliability  and  through  extensive  and  effective  redundancy  to  achieve  fault  tolerance. 

2. 1 Flight  Phase  Critical 

Flight  control  systems  used  for  flight  phase  critical  functions  usually  perform  a pilot  assist  role;  that  is,  the  flight  crew  could 
perform  the  same  functions,  but  an  automatic  control  system  can  perform  the  task  more  precisely  and  with  greater  safety.  A 
primary  example  of  such  systems  is  the  automatic  landing  systems  employed  on  the  Boeing  747  and  other  commercial 
aircraft. 

The  747  automatic  landing  system  includes  the  Sperry  Rand  SPZ-1  autopilot-flight  director  system  and  provides 
fail-operate  capability  (Reference  1).  This  system  has  the  capability  of  sustaining  a failure  while  automatically  guiding  the 
aircraft  to  touchdown  using  existing  instrument  landing  system  ground  radio  facilities  The  performance,  operational 
procedures,  safety  and  reliability  of  this  equipment  are  consistent  with  requirements  for  operation  under  Category  IIIA 
weather  conditions.  Fail-operational  au.^land  will  be  the  basis  for  progress  toward  automatic  landing  capability  under  the 
more  severe  Categories  IIIB  and  MIC  weather  conditions.  To  meet  the  requirements  imposed  by  these  categories,  improved 
ground-based  radio  references  and  an  independent  visual  display  system  for  the  flight  crew  seem  essential.  But.  the  most 
significant  requisites  for  implementation  of  all-weather  landing  capability  are  further  experience  with  current  autoland 
techniques,  continued  equipment  enhancement,  and  closer  cooperation  betv/een  manufacturers,  airlines,  pilots,  and 
regulatory  agencies 

The  National  Aeronautics  and  Space  Administration  is  conducting  a icng-ierm  program  called  the  Terminal  Configured 
Vehicle,  or  TCV.  program,  to  address  some  of  the  major  problems  involving  transport  aircraft  operating  in  terminal  areas 
calling  for  new  or  improved  capabilities  in  airborne  systems  (Reference  2).  This  flight  research  program  uses  a Boeing  737 
modified  to  incorporate  an  aft  flight  deck  and  advanced  on-board  electronic  systems.  The  aft  flight  deck  is  a second  cockpit, 
installed  in  the  passenger  cabin,  which  will  simulate  a normal  flight  deck  The  two  man  crew  in  the  aft  flight  deck  am  able  to  fly 
the  airplarie  from  takeoff  through  landing. 

The  advanced  electronic  systems  include  a triply  redundant  digital  automatic  flight  control  system  and  a digital  navigation 
guidance  and  display  system,  including  cathode  ray  tube  displays.  The  systems  integrated  into  the  737  airplane  enable 
inflight  experiments  supporting  research  into  terminal  area  operations,  including  noise  abatement  approaches,  precision 
3-dimensional  area  navigation,  and  time  navigation  to  reduce  delays  and  fuel  expenditures  and  to  increase  airpod  capacity 
and  landing  operations  in  conditions  down  to  Category  III.  These  experiments  will  lead  to  definition  of  needed  aerodynamic 
and  avionic  features  necessary  to  operate  in  future  high-density  termina'  areas. 

2.2  Aircraft  Configuration  Dependent 

Fiy-by-wire  and  active  control  technologies  have  progressed  to  the  point  that  they  can  be  applied  to  prototype  and 
production  aircraft  (Reference  3).  Analytical  studies  and  flight  demonstrations  have  genem.'iy  proven  the  advantages  of  FBW 
and  ac  ve  control  systems  in  terms  of  aircraft  performance 

Fly-by-wire  increases  the  number  and  complexity  of  control  functions  that  can  be  easilv  impiemeoted  Implementation  o* 
active  control  technology  concepts  does  not  depend  on  the  adoption  of  fly-by-wire.  but  the  greatest  payo*'  for  both  is  realized 
when  they  are  integrated  An  airplane  designed  for  maximum  utilization  of  active  control  technology  concepts  wiH  most  iikelv 
use  FBW  in  the  system  implementation  because  of  the  flexibility  available  to  the  designer,  such  as  in  bringing  together 
signals  to  a common  control  surface  from  several  sources  in  a straight  forward  manner 

Active  control  technology  offers  a new  aircraft  design  philosophy  that  provides  increased  design  freedom  Tr  e active 
control  functions  that  provide  the  most  potential  for  improvement  are 

• Flutter  Mode  Control 

• Maneuver  Load  Control 

• Ride  Control 

• Gust  Load  Alleviation 

• Augmented  Stability 

• Fatigue  Reduction 

All  of  these  concepts  except  gust  load  alleviation,  were  successfully  flight  demonstrated  during  the  B-52  Controls 
Configured  Vehicle  program  In  addition,  all  of  these  concepts  except  flutter  mode  control  have  been  individually  committed 
to  production  to  some  extent.  A ride  control  system  is  being  used  on  the  747  airplane  and  the  B- 1 . The  F- 1 6 selected  by  the  Air 
Force  for  production  uses  augmented  stability  A combined  fatigue  reduction  and  maneuver  load  control  system  is  being 
installed  on  the  C-5A  to  extend  the  wing  fatigue  life 
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The  full  potential  of  active  control  technology  can  be  achieved  only  by  incorporating  the  concepts  in  the  initial  design  phase 
of  a new  airplane.  This  approach  involves  integrating  proven  flight  control  technology  into  the  aircraft  configuration  definition 
on  an  equal  basis  with  the  design  technologies  of  aerodynamics,  structures  and  propulsion,  as  shown  in  Figure  1 . Aircraft 
configurations  which  integrate  active  controls  in  the  design,  are  dependent  on  the  control  system  to  meet  design  objectives. 
For  such  an  aircraft  the  control  system  must  be  as  reliable  as  structure  to  have  a configuration  compretitive  with  conventional 
aircraft. 


Figure  1.  Active  Control  Technology  Airplane  Design  Cycle 

Future  aircraft  designs  being  considered  at  the  present  time  which  will  require  active  control  technology  concepts  are 
discussed  briefly  in  the  following  paragraphs.  These  examples  illustrate  the  increasing  complexity  of  future  aircraft  control 
^ systems  and  resulting  need  for  higher  reliable  systems. 

I 

A transport  with  asymmetric  wing,  being  considered  by  NASA,  may  offer  performance  potential  sufficient  to  bring  about  Its 
[ eventual  development  (Reference  4).  The  fuselage  of  this  aircraft,  designed  to  cruise  at  high  transonic  speeds,  would  be  long 

f and  slender  without  the  conventional  stiffening  provided  by  the  wing  root  structure.  It  is  probable  that  a ride  control  system 

! would  be  necessary  to  provide  a passenger  comfort  level  commensurate  with  current  transport  aircraft.  The  motion  character 

of  the  airplane  is  somewhat  unusual,  principally  because  of  aerodynamic  coupling  A stability  augmentation  system  would  be 
i required  to  decouple  the  motion  and  provide  capability  for  conventional  piloting  techniques. 

I The  freighter  concept  shown  in  Figure  2 would  make  extensive  use  of  active  control  concepts  to  achieve  its  maximum 

[ performance  potential  (Reference  5).  The  airplane  is  designed  such  that  empty  weight  is  only  25  percent  of  the  gross  weight, 

compared  to  about  40  prercent  for  current  technology  freighter  designs. 

Trim  and  maneuver  load  control  systems  would  permit  adjustment  of  the  airload  distribution  to  closely  match  the  weight 
‘ distribution,  thus  greatly  reducing  wing  bending  moments  and  leading  to  reduced  structural  weight.  Additional  performance 

improvement  could  be  obtained  by  utilizing  augmented  stability  and  positioning  the  center  of  gravity  for  maximum 
performance.  This  airplane  will  probably  require  digital  control  system  technology,  with  its  capability  for  monitoring  system 
state  and  selecting  healthy  channels  The  size  of  the  airplane,  together  with  the  large  number  of  separate  control  surfaces 
( and  the  necessity  (or  individual  commands,  leads  to  a natural  use  for  fly-by-wire 

( 

, Another  application  of  active  control  technology  demonstrating  the  complexity  possible  is  the  heavy-lift  airship  concept 

[ proposed  by  Goodyear  (Reference  6).  Figure  3 shows  a model  demonstrating  the  concept,  where  (our  Sikorsky  CH-54B 

f helicopters  are  attached  to  a 2.5  million  cubic  foot  Dacron/neoprene  hull  The  four  helicopters  would  be  controlled  in  parallel 

( from  a single  cockpit  by  a digital  fly-by-wire  control  system 
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The  helicopters  are  gimballed  in  pitch  by  main  rotor  cyclic  pitch  and  driven  by  servo  controlled  actuators  in  roll  to  offset 
gimbal  coupling  forces  resulting  from  rotor  torque.  The  tail  rotors  on  the  aft  helicopters  are  replaced  with  propellers  and 
reoriented  to  provide  pusher  capability  required  for  forward  flight  in  the  unloaded  condition  The  airship  would  be  flown  using 
standard  helicopter  controls  All  helicopters  would  be  controlled  by  a command  pilot  who  would  have  electric  cyclic  and 
collective  sticks  and  rudder  pedals  that  generate  fly-by-wire  commands  to  all  four  helicopters  A fly-by-wire  system  (or 
controlling  a tandem  helicopter  was  developed  by  Boemg-Vertol  during  the  recent  heavy  lift  helicopter  program 


Figure  2.  Special  Purpose  Freighter 


Figure  3.  Goodyear  Heavy  Lift  Airship  Concept 


The  airship  would  be  taken  off  and  landed  vertically  under  manual  control  ol  the  command  pilot  Manual  controls  tor 
longitudinal  and  lateral  flight,  hover,  and  pitch  and  roll  trim  would  also  be  provided  The  automatic  flight  control  system  would 
include  heading,  altitude  and  pitch  attitude  hold  modes  for  forward  flight  The  precision  hovering  system  would  have 
longitudinal  and  lateral  position  holds  with  creep  capability  altitude  hold,  and  roll  and  pitch  attitude  hold 


To  meet  future  demands  for  improved  control  system  reliability,  significant  advancements  will  be  achieved  in  system 
components,  redundancy  management,  and  built-in-test  techniques  The  following  sections  summarize  some  of  the  more 
significant  trends. 
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3.0  COMPONENT  RELIABILITY  TRENDS 

3.1  Computational  Trends 

In  the  computational  area,  a definite  trend  toward  digital  implementation  is  seen  in  both  commercial  and  military  aircraft 
Digital  techniques  which  have  already  been  applied  to  various  subsystems  on  current  wide-body  transports,  are  now  being 
evaluated  for  the  more  critical  flight  control  system  applications  The  expected  advantages  such  as  reduced  initial  costs, 
improved  reliability,  easier  trouble  shooting  and  greater  flexibility  are  generating  considerable  interest  in  the  digital  approach 
Boeing  and  McDonnell  Douglas  are  already  studying  digital  autopilot  mechanizations  for  new  transports  known  as  7X7  and 
DC-X  (Reference  7).  It  has  been  estimated  that  a digital  mechanization  of  the  L-101 1 autopilot  could  result  in  savings  of  25 
percent  in  initial  cost.  30  percent  in  maintenance  costs  and  20  percent  in  weight  In  addition,  it  should  provide  a 25  percent 
improvement  in  reliability. 

While  digital  flight  controls  are  generally  most  attractive  for  new  aircraft  where  they  can  be  designed  integrally  with  the 
airframe  without  an  unusual  nonrecurring  expense  burden,  some  retrofit  interest  is  seen  McDonnell  Douglas,  for  example, 
has  included  a digital  flight  control  system  in  one  of  its  DC-9-50  aircraft  proposals  Boeing  is  also  studying  digital  control 
retrofits  for  several  of  its  products,  including  the  727  and  B-52  aircraft.  These  systems  would  combine  existing  sensors  and 
subsystems  with  a new  digital  computer  that  includes  a very  high  degree  of  fault  detection  capability 

3.1.1  Digital  Versus  Analog  Implementation 

The  digital  computer  has  several  potential  advantages  for  reliable  control  system  applications  First,  it  can  provide  superior 
test  coverage  with  less  hardware  than  equivalent  analog  built-in-test  (BIT)  systems  A typical  analog  BIT  with  85  to  95  percent 
test  coverage  requires  bit  circuitry  that  amounts  to  20  or  25  percent  of  the  total  system  hardware  (Reference  8)  In  an 
I equivalent  triplex  digital  mechanization,  the  additional  memory  and  I/O  hardware  requirement  would  comprise  only  one  to 

four  percent  of  the  total  system  memory  and  interface.  Second,  a digital  mechanization  eliminates  tolerance  accummulation 
Third,  sophisticated  signal  selection  algorithms,  reasonableness  testing  and  performance  monitoring  are  possible  which  far 
exceeds  analog  system  capabilities.  Fourth,  serial  intercomputer  links  can  be  used  to  reduce  the  volume  of  cross  channel 
^ wiring. 

i i he  digital  computer  does  have  some  potential  disadvantages,  with  respect  to  reliable  systems,  however  Failure  modes 

and  effects  tend  to  be  difficult  to  characterize  and  some  failures  tend  to  be  difficult  to  detect  with  software  self-test  alone  In 
addition,  digital  computer  implementations  are  susceptible  to  multiple  channel  generic  software  failures  Solutions  do  exist 
i for  these  limitations,  therefore  they  should  not  significantly  impede  the  trend  toward  digital  computation  For  example,  a low 

' cost  microprocessor  monitor  could  be  added  to  each  channel  to  detect  both  computer  hardware  and  software  errors 

3.1.2  Implementation  Technology  Trends 

There  are  a number  of  current  developments  and  trends  in  digital  system  implementation  technology  that  will  contribute  to 
the  advancement  of  reliable  flight  control  systems  This  section  will  concentrate  on  computational  subsystem  trends; 
component  trends  will  be  discussed  in  subsequent  sections. 

The  trend  in  overall  airborne  computational  system  architecture  is  toward  distributed  systems,  which  simply  means  that 
each  aircraft  system  function,  such  as  flight  control,  has  its  own  dedicated  processors,  but  is  able  to  exchange  data  with  other 
systems,  such  as  navigation  and  air  data  (Reference  9).  Factors  influencing  the  trend  include  component  cost  and  conflicting 
redundancy  requirements  of  various  subsystems.  The  continually  reducing  cost  of  large  scale  integrated  circuit  computing 
^ components  is  reducing  the  economic  advantage  of  the  large  central  computer  In  addition,  the  requirement  for  flight  critical 

functions  to  be  redundant  tends  to  militate  against  mixing  them  with  less  critical  functions  To  do  so  would  impose  the  same 
rigid  level  of  testing  and  control  on  noncritical  software  as  for  critical  system  software  This  would  result  in  an  unnecessary 
cost  penalty  in  most  cases. 

I State-of-the-art  developments  in  the  area  of  large  scale  integrated  microcircuits  have  made  it  possible  to  consider  the  use 

f of  multiplex  communications  links  in  future  flight  control  systems.  Flight  control  systems  and  most  particularly,  redundant 

j systems  generally  require  large  numbers  of  communications  paths  between  the  flight  control  computer  and  vanous 

I subsystems  which  interface  with  it.  These  subsystems  include  sensors,  actuators,  controls  and  displays,  in  addition  to  other 

[ computer  based  subsystems,  such  as  navigation  The  potential  benefits  of  multiplexing  include  reduced  wiring  complexity 

and  weight,  interface  standardization,  system  flexibility  (can  add  sensors  without  extensive  rewiring),  reduction  in  connector 
pins  needed  and  improvement  in  electromagnetic  compatibility  due  to  fewer  radiating  wires. 

Future  reliable  flight  control  systems  will  utilize  optoelectronics  and  fiber  optics  for  interchannel  communications 
Optoisolators  can  be  used  to  eliminate  ground  loop  problems  while  fiber  optics  provide  electromagnetic  interference-free 
cross  digital  communications  between  channels  and  subsystems. 

^ A definite  trend  toward  asynchronous  operation  of  digital  computers  in  redundant  channels  is  seen  Flight  experience  with 

redundant  digital  systems,  such  as  the  Air  Force  Flight  Dynamics  Laboratories  A-7  tests  and  simulation  studies  have  pointed 
out  the  hazards  of  synchronous  computer  operation.  One  such  study  was  conducted  by  Lear  Siegler 

. J 
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The  purpose  of  the  Lear  Siegler  simulation  study  was  to  examine  input  accuracy  and  bias  effects,  mdepenoent 
sampling'computation  rates,  high  input  signal  rates,  and  Integrator  divergence 

In  the  study  of  the  trade  offs  between  synchronous  and  asynchronous  operation,  those  factors  which  would  appear  to  be 
differences  between  the  two  approaches  are  discussed  (Reference  10).  The  main  points  answered  are  the  benefits  wh'Ch 
are  offered  in  either  approach,  the  mechanization  differences,  the  operational  differences  and  the  reliability  differences  This 
study  verifies  that  asynchronous  operation  provides  more  reliability  when  employed  In  redundant  digital  Might  control 
systems  through  the  avoidance  of  the  potential  single  point  failures  of  synchronized  clock  operation  It  also  avoids  the 
necessity  to  develop  and  qualify  redundant  hardware  clocks  or  to  develop  and  validate  the  software  logic  lor  a software  clock 
This  will  result  in  lower  program  costs  for  the  asynchronous  approach.  Another  benefit  of  asynchronous  operation  is  the 
reduction  in  EMI  induced  control  surface  transients.  This  results  from  the  fact  mat  there  is  a lovr  probability  of  all  channels 
being  in  the  same  computation  cycle  at  the  time  the  EMI  effect  is  present.  The  subsequent  output  signal  selection  will 
eliminate  the  single  channel  transient. 

A trend  towards  using  a library  of  software  modules  which  can  be  tailored  and  linked  to  fit  the  software  requirements  of  a 
specific  system  will  have  important  impacts  on  both  system  cost  and  reliability  (Reference  9)  This  is  a significant  change 
from  current  avionics  software  practice  in  which  ad  hoc  techniques  are  used  on  a system  basis,  producing  software  that  is 
both  expensive  and  unique.  Being  able  to  select  and  tailor  already  validated  modules  to  satisfy  a new  requirement  alsc 
contributes  to  reliability  because  validation  and  verification  of  the  software  will  tend  to  be  more  complete. 

A trend  towards  the  use  of  higher  order  languages  is  an  important  companion  of  the  library  of  modules  trend  in  order  for  the 
library  to  be  transferable  from  one  computer  to  another.  Structured  programming  and  software  testing  techniques  will  tend  to 
alleviate  the  reliability  questions  sometimes  associated  with  computer  object  code  generated  by  higher  order  language 
compilers. 

Technology  advancements  in  areas  of  microprocessors  and  memories  are  destined  to  produce  a large  impact  on  the 
development  of  future  reliable  control  systems.  Mass  memories,  now  dominated  by  rotating  disks  and  tapes,  will  soon  be 
replaced  by  solid-state  equivalents,  such  as  magnetic  bubble  and  charge-coupied  device  memories  Core  memories  and 
volatile  semiconductor  memories  will  undoubtedly  give  way  to  nonvolatile  semiconductor  memories  which  are  electrically 
alterable. 


3.2  Electronic  Trends 

The  control  system  technology  area  which  is  currently  undergoing  the  most  rapid  development  is  electiomcs  Growth 
areas  of  interest  to  reliable  system  designs  include  large  scale  integrated  circuit  technology,  microprocessors,  memories, 
data  converters,  packaging  technology,  mass  storage  and  fiber  optics 

Probably  the  biggest  single  contributor  to  the  current  downward  trend  in  the  cost  performance  ratio  in  digital  electronics 
can  be  attributed  to  advancements  in  large  scale  integrated  circuit  production  (Reference  9)  This  technology  allows 
hundreds,  or  even  thousands,  of  logic  functions  to  be  implemented  on  a single  integrated  circuit  chip  Because  the  use  of 
these  chips  reduces  pads  count,  reliability  as  well  as  cost  is  improved  This  technology  has  led  to  low-cost  off-the-shelf 
microprocessors,  memories  and  various  other  computational  building  blocks  v hich  can  be  used  in  flight  control  systems 

The  dramatic  growth  of  integrated  circuit  production  densities  in  the  seventies  is  illustrated  in  Figure  4 Since  1971  memory 
densities  have  increased  by  more  than  an  order  of  magnitude;  from  one  thousand  to  S'xteen  thousand  bits  per  chip  Logical 
integrated  circuit  densities  have  made  similar  gains.  It  is  not  unreasonable  to  predict  that  production  densities  will  reach  sixty 
thousand  by  1980. 

The  LSI  technology  has  spawned  a significant  new  technology  called  microprocessors  The  microprocessor  is  defined  as  a 
standard  programmable  LSI  which  consists  of  a parallel  arithmetic  unit,  a control  unit,  and  a general  purpose  parallel  data  bus 
for  memory  and  external  device  communications.  This  chip  (or  chip  set)  can  be  combined  with  LSI  memory  chips  to  realize  a 
general  purpose  microcomputer  for  extremely  low  cost 

These  microcomputers,  having  the  programmability  of  conventional  general  purpose  computers,  will  be  used  to  perform 
computational  functions  and  to  replace  hard-wired  logic  In  both  applications  a significant  fact  is  that  there  is  no  longer  a 
driving  force  to  use  the  device  efficiently  System  level  cost  trade  offs  tend  to  lead  to  dedicated  use  of  these  devices  for 
certain  functions  even  though  this  may  result  in.  for  example,  the  device  being  kept  busy  only  20  percent  of  the  time 

In  random  access  memories.  LSI  technologies  have  already  given  a clear  indication  that  they  are  replacing  expensive  and 
space  consuming  core  memories  (Reference  1 1 ) In  some  current  designs,  read  only  memories  are  being  used  tor  program 
instruction  storage  while  volatile  semiconductor  random  access  memories  are  used  for  variable  storage  The  trend  is  toward 
eliminating  core  oven  for  critical  flight  control  applications  where  nonvolatile  variable  sto'age  memory  is  a requiiement  Fast 
nonvolatile  semiconductor  read  write  memories  will  soon  become  available  to  fill  that  need  Also  noteworthy  is  the 
development  of  charge-coupled  device  (CCD)  memory  systems  as  solid-state  replacements  ‘or  disks  and  drums  In 
architecture,  the  CCD  memory  differs  from  the  familiar  ROM  or  RAM  so  radically  that  if  deserves  further  discussion.  The  new 
devices  basically  simulate  the  operation  of  rotating  drums.  For  example,  one  16,3e4-by-1  -bit  chip  is  organized  so  that  it  can 
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Figure  4.  Integrated  Circuit  Production  Densities  in  the  Seventies 

combine  serial  and  random-access  functions.  The  devices  are  arranged  as  64  256-bit  shift  registers  in  which  four-phase 
clock  signals  simultaneously  shift  the  data  Each  shift  register  can  be  thought  of  as  representing  a single  track  in  a 
conventional  drum,  and  each  track  can  be  thought  of  as  being  divided  into  256  sectors  corresponding  to  the  256  CCD  data 
storage  cells  in  each  line.  The  "rate  of  rotation"  of  this  semiconductor  drum  therefore  is  controlled  by  tne  four-phase  clock. 

CCD  memories  have  clear  advantages  over  existing  high-speed  drums.  They  are  an  order  of  magnitude  faster,  despite  the 
fact  that  their  data  is  accessible  only  by  block  or  line.  At  present,  any  line  is  accessible  in  100  microseconds,  and  that  time  is 
expected  to  decrease  significantly  over  the  next  few  years.  CCD  memories  also  have  no  need  for  high-speed  mechanical 
movements  and  should  therefore  be  much  more  reliable  and  last  longer  than  today's  drums  with  their  extremely  fast  ro'ating 
mechanical  assemblies  Estimates  are  that  a simple  five-board  CCD  sysfem  could  be  only  half  the  cost  of  a 5 million-bit 
rotating  system. 

Additional  electronic  component  advances  of  interest  to  reliable  flight  control  designers  include  improved  data  converters, 
operational  amplifiers,  and  optical  couplers  (Reference  12).  Microcircuit  size  monolithic  analog-to-digital  and 
digital-to-analog  converters  which  require  no  external  components  will  become  available.  These  are  already  available  in 
hybrid  models  with  somewhat  larger  dimensions,  or  in  monolithic  circuits  which  require  a few  exfernal  components  Improved 
field  effecf  transistor  input  operational  amplifiers  are  being  developed  using  ion  implantation  production  techniques.  High 
gain  optical  couplers  for  both  analog  and  digital  applications  are  on  the  way.  These  will  be  used  to  avoid  failure  propagation 
paths  for  more  reliable  redundanf  sysfems. 

New  methods  of  packaging  electronics  will  see  more  widespread  application  in  reliable  control  systems  (Reference  13). 
Techniques  such  as  die-stamped  circuit  boards,  multiwire,  and  stitch  wiring  will  provide  manufacturing  alternatives  that  are 
free  of  environmental  impact.  Of  the  packaging  techniques  mentioned,  stitch  wiring  is  perhaps  the  most  attractive  for  flight 
critical  systems  because  of  its  very  high  reliability. 

Hybrid  circuits  are  emerging  as  an  extremely  high  density  method  of  packaging  digital  electronics  Thick-film  hybrids 
normally  are  thought  of  as  relatively  simple  analog  circuits  on  relatively  small  substrates,  like  a quarter  inch,  but  this  is  a 
misconception.  Many  thick-film  hybrid  manufacturers  now  are  turning  out  large-scale  digital  hybrids  in  units  as  large  as  nine 
inches  square  and  containing  upwards  of  220  monolithic  integrated  circuit  chips.  These  units,  aimed  at  the 
military/avionics/space  field,  are  possible  because  of  the  development  of  multilayering  in  thick-film  hybrids  and  the  greater 
use  of  computer-aided  design  and  manufacturing.  Digital  circuit  densities  as  high  as  40  monilithic  1C  chips  per  square  inch 
are  possible  by  this  multilayered  technique  as  compared  to  only  2 DIP  IC's  per  square  inch  for  conventional  packaging 


Fiber  optics  will  become  more  and  more  attractive  to  redundant  system  designers  as  optical  fiber,  connector,  emitter  and 
sensor  technologies  continue  to  develop  (Reference  14)  The  primary  reason  for  considering  optical  data  transmission  for 
fly-by-wire  sysfems  is  its  great  potential  for  improved  survivability  to  physical  damage  and  electromagnetic  phenomena  such 
as  lightning  strikes.  The  survivability  advantage  over  wire  is  achieved  because  an  optical  cable  can  be  divided  into  two  parts 
and  recombined  by  means  of  Wye  connections,  with  data  transmission  integrity  maintained  by  only  one  of  the  paths  The 
second  path  may  be  damaged  or  destroyed  without  coupling  a disruption  into  the  remaining  path  This  is.  of  course,  not 
possible  with  conventional  wire  where  a short  circuit  in  any  strand  causes  total  failure  of  all  points  tied  to  that  wire  strand  The 
potential  for  lightning  survivability  is  estimated  on  the  basis  of  immunity  to  induced  currents  in  the  optical  cable  and  the 
electrical  isolation  of  all  circuitry  from  that  cable. 
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To  illustrate  how  an  optical  channel  has  a greater  damage  survivability  potential  than  an  electrical  data  channel  for  a 
fly-by-wire  application,  consider  a quad  redundant  configuration  of  four  control  computers  transmitting  control  data  to  control 
actuators.  Now,  as  illustrated  in  Figure  5a,  suppose  we  wished  to  improve  damage  survivability  by  running  every  wire  twice, 
once  along  the  left  side  and  once  along  the  right  side  of  the  aircraft  If  damage  on  one  side  of  the  aircraft  merely  opens  a wire, 
then  we  would  indeed  have  four  surviving  data  transmission  paths,  one  for  each  of  the  four  control  computers  However 
damage  is  rarely  so  benign,  and  one  should  expect  that  the  damaged  conductors  would  be  shorted  to  each  other  or  to 
structure  (ground)  as  shown  in  Figure  5a.  With  such  electrical  shorting  of  conductors,  the  redundant  set  of  wires  on  the 
undamaged  side  are  of  course  useless.  (A  short  of  only  one  strand  in  a multistrand  electrical  conductor  destroys  the  function 
of  the  entire  conductor.)  Consequently,  wiring  a quad  redundant  fly-by-wire  system  as  shown  in  Figure  5a  wou  d be  a 
fundamental  violation  of  system  safety  criteria  because  a single  source  of  damage  could  fail  ali  four  control  channels 

The  proper  way  to  wire  a quad  redundant,  electrical  fly-by-wire  system  is  illustrated  in  Figure  5b  Here,  two  channels  are 
wired  along  the  left  side  of  the  aircraft  and  two  along  the  right  side  of  the  aircraft  Damage  vulnerability  is  still  severe  as  shown 
in  the  illustration,  for  now  two  cables  and  hence  two  of  the  tour  channels  can  be  disabled  by  a single  source  of  damage  Thus 
a double  fail-operative  fly-by-wire  system  can  be  reduced  to  a fail-passive,  dual  channel  system  following  a single  source  of 
damage. 

The  advantage  of  the  optical  system  is  illustrated  in  Figure  5c.  Each  of  the  four  compufers  provides  one  cable  along  fhe 
aircraft's  left  side  and  one  along  the  right  side.  This  is  the  same  configuration  as  in  Figure  5a  except  that  an  optical  short 
cannot  occur  if  a cable  is  destroyed.  Hence,  the  left  and  right  side  transmissions  are  uncoupled.  Destroy  all  the  cables  on  the 
left  side,  and  all  four  cables  on  the  right  side  continue  transmitting  as  if  nothing  had  happened.  The  quad  redundant  system 
remains  double  fail-operative  despite  destruction  of  all  cables  on  one  side  of  the  aircraft.  This  remarkable  survivability  can  be 
achieved  with  fiber  optics  because  the  equivalent  of  an  electrical  short  cannot  occur  in  a fiber  optic  cable  That  is,  the  only 
failure  mode  is  the  equivalent  of  the  electrical  open. 

3.3  Sensors 

The  sensors  most  commonly  employed  in  automatic  flight  control  systems  are  gyros,  accelerometers,  and  differential 
transformers.  Reliability  of  linear  variable  differential  transformers  is  well  established,  and  they  will  continue  to  be  used 
successfully  on  future  aircraft.  Servonulled  linear  accelerometers  have  usually  proved  more  than  adequate  in  most  past 
applications,  and  improvements  are  being  made.  Probably  the  most  significant  improvements  will  be  made  in  angular  rate 
sensors. 

Pendulous  force-rebalance  linear  accelerometers  will  continue  to  dominate  reliable  control  system  applications  Typical 
units  will  have  no  wearout  modes  and  will  offer  very  high  reliability.  One  example  is  the  new  Honeywell  accelerometer  that 
incorporates  a unique  mechanization  resulting  in  low  cost  and  high  accuracy  with  time  and  environmental  exposure 
(Reference  15).  The  pendulum  and  suspension  are  fabricated  from  quartz  fibers  arranged  as  shown  in  Figure  6 

A thin  film  of  silver  is  vapor  deposited  over  the  quartz  suspension  and  pendulum.  The  base  of  the  pendulum  operates  in  a 
permanent-magnet  field,  providing  a one-turn  torque  generator.  The  null  detector  consists  of  a light  source  and  a dual  silicon 
photodiode.  The  p-layer  of  fhe  silicon  p-n  junction  is  divided  into  two  parts  by  a thin  separation.  When  the  base  of  fhe 
pendulum  coincides  with  this  separation,  the  null  position  is  achieved,  and  the  dc  outputs  of  the  dual  photodiode  are 
balanced  The  servoamplifier  used  to  control  the  pendulum  to  the  null  position  is  a standard  commercial  A'A"r4l  integrated 
circuit.  The  lamp  is  rated  for  a useful  life  in  excess  of  20,000  hours.  Severe  environmental  conditions  have  been  applied  both 
in  test  and  in  the  field  with  no  lamp  or  suspension  failures. 

Studies  have  shown  that  spin  motor  and  bearing  failures  account  for  about  65  percent  of  all  rate  gyro  failures  (Reference 
16).  Rate  sensors  used  on  future  highly  reliable  automatic  flight  control  systems  will  not  use  a spin  motor  and  attendant 
bearings.  Rate  sensors  that  eliminate  this  weakness  are  currently  being  developed  These  include  a solid  state  rate  sensor,  a 
ring  laser  gyro  and  a magneto  hydrodynamic  rate  sensor. 

The  General  Electric  "VYRO"  solid  state  rate  sensor  eliminates  the  rotating  mass  with  its  associated  bearings,  motor  and 
gimbal.  and  replaces  them  with  a vibrating  beam  supported  by  two  wires  and  driven  by  piezoelectric  transducers  Without 
rotating  parts,  the  VYRO  is  potentially  more  reliablethan  the  conventional  rate  gyro  The  predicted  mean  time  between  failure 
for  the  unit  is  45,000  hours.  Performance  of  the  VYRO  has  been  demonstrated  on  an  F-4J  airplane  and  in  laboratory  tests 

Because  the  ring  laser  rate  sensor  operation  depends  only  on  optical  and  electrical  phenomena,  it  is  mechanically  simpler 
than  conventional  rotating-mass  gyroscopes  A Honeywell  ring  laser  gyro  uses  a laser  block  of  dimensionally  stable  Cer-Vit 
material  made  by  Owens-Illinois,  Inc.  (Reference  17).  Electronic  circuits  in  the  gyro  case  control  fhe  laser  and  convert  its 
optical  output  to  an  electrical  signal  One  of  these  gyros  has  operated  continuously  for  more  than  18.000  hours  without  any 
degradation  in  performance  or  increase  in  lasing  threshold  current  The  Sperry  Ring  Laser  Sensor  is  similar,  but  it  uses  an 
aluminum  optical  cavity  that  is  heated  to  provide  the  required  dimensional  stability  The  instrument  case  is  evacuated  and 
sealed,  with  all  manufacturing  adjustments  locked  prior  to  sealing 

The  Honeywell  magneto  hydrodynamic  rate  sensor  uses  an  angular  accelerometer  in  the  form  of  a torus  of  liquid  metal  as 
Its  basic  sensor  (Pelerer.ce  1 5).  The  torus  is  continuously  rotated  about  a diameter  of  the  torus,  which  permits  the  device  to 
measure  angular  rale  in  two  axes  The  complete  rate  sensor  consists  of  a magneto  hydrodynamic  angular  accelerometer,  a 


OPTICAL  TRANSMISSION  WITH  SPLIT  PATHS  FOR  EACH 
CHANNEL  (INVULNERABILITY  OF  ALL  FOUR  CHANNELS 
TO  DAMAGE  ON  ONE  SIDE  OF  AIRCRAFT) 


Figure  5.  Quad-Redundant  Fly-by-Wire  System  Wiring 
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synchronous  hysteresis  drive  motor,  a two-phase  reference  generator  which  permits  resolution  of  the  output  into  its  two 
orthogonal  axes,  and  a slip  ring  assembly  to  transfer  the  output  signal  from  the  rotating  element  to  the  preamplifier  mounted 
within  the  hermetic  seals.  The  sensor  projects  a very  high  reliability,  but  slip  ring  wearout  requires  a 1 500-hour  time  between 
scheduled  replacement. 


TORQUER  MAGNETS 


VAPOR  DEPOSITED  CONDUCTING  FILM 


AMPLIFIER 


Figure  6.  Honeywell  Linear  Accelerometer 


3 4 Control  Surface  Actuators 


Highly  reliable  automatic  flight  control  systems  will  continue  to  use  hydraulic  actuators  to  drive  the  aerodynamic  control 
surfaces.  Trends  in  actuator  technology  are  toward  actuator  redundancy  schemes  tailored  to  individual  aircraft 
requirements,  with  the  systems  operating  at  pressures  above  the  usual  3000  psi.  Improvements  will  be  made  in  hydraulic 
power  supply  units  to  increase  reliability  and  reduce  maintenance  and  downtime 

Perhaps  the  most  significant  changes  in  control  surface  actuation  will  come  about  in  the  hydraulic  supply  unit  Studies 
funded  by  the  U S.  Navy  show  that  significant  improvements  in  system  efficiency,  maintainability,  survivability  and  reliability 
can  be  attained  by  using  8000  psi  supply  pressures  and  incorporating  advance  filtration  and  distribution  technology 
(Reference  18).  The  Navy  lightweight  hydraulic  system  concept  is  to  use  high  pressure  hydraulics  to  achieve  large 
reductions  in  system  weight  and  space  requirements.  This  will  be  important  in  future  high  performance  aircraft  as 
aerodynamic  considerations  require  high  density  airframes  and  thin  wings,  resulting  in  highly  confined  areas  for  installation 
of  system  components. 

The  hydraulic  supply  systems  will  incorporate  advanced  technologies,  including  those  related  to  advanced  hydraulic 
fluids,  permanent  tube  connectors,  hermetically  sealed  components,  modularized  components,  metallic  and  double  stage 
seals  titanium  tubing  and  ultrafine  system  filtration.  The  Navy  predicts  a 20  to  25  percent  savings  in  maintenance  manhours 
and  resulting  aircraft  downtime  can  be  realized  with  the  lightweight  hydraulic  system  by  the  application  of  multiple  stage  seals 
and  metallic  designs  which  will  extend  replacement  life  in  both  static  and  dynamic  conditions  The  use  of  permanent  type 
connectors  will  eliminate  a large  number  of  leakage  paths  and  further  reduce  maintenance  manhours  required  to  service 
transmission  lines  and  components  This  will  result  in  less  introduction  of  contaminant  into  ttie  system  Application  of  finer 
micron  rated  filters  incorporating  improved  filter  media  will  result  in  higher  efficiency  and  contaminant  holding  capacity 

As  aircraft  rely  more  heavily  on  automatic  flight  control  systems,  the  majority  of  commands  to  the  control  surface  actuators 
will  come  from  the  automatic  control  channels  It  is  inevitable  that  the  simpler,  higher  performance  fly-by- wire  implementation 
will  result  One  of  the  significant  advantages  of  fly-by-wire  is  the  potential  elimination  of  primary  control  linkages,  rods  and 
cables.  Future  highly  reliable  control  systems  will  employ  integrated  actuators,  capable  of  positioning  the  control  surface 
directly  from  an  electrical  command  (Reference  15).  The  integrated  actuator  may  contain  some  form  of  driver  actuator,  or  it 
may  be  a straightforward  electrohydraulic  servoactuator  with  integral  monitoring  elements  The  integrated  actuator  could 
even  contain  its  own  electrically-powered  hydraulic  supply,  perhaps  using  the  controlled-displacement  servopump  principle 


The  integrated  actuators  will  probably  use  analog  servo  feedback  loops,  although  research  efforts  now  are  directed  toward 
development  of  digital  valves  and  feedback  encoders  Partial  digital  servoloops  could  be  employed,  but  such  hybrid  systems 
are  not  competitive  with  analog  at  the  present  time 
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Highly  reliable  flight  control  systems  will  continue  to  be  achieved  through  redundancy  management.  The  redundancy 
manager  should  think  in  system  level  terms  at  the  onset  to  assure  that  the  redundancy  management  philosophy  considers  all 
flight  control  system  components.  This  should  include  secondary  functions  such  as  central  air  data  systems  and  electrical 
and  hydraulic  power  supplies.  It  is  important  to  remember  that  complex  redundancy  management  is  only  a means  to  achieve 
a reliability  requirement  on  the  system  level.  Future  architectures  will  meet  the  high  reliability  requirements  through  both 
component  reliability  and  redundancy  management. 


Past  and  present  flight  control  systems  have  been  treated  as  complete  systems  somewhat  independent  from  the 
navigation  system.  Future  systems  will  see  an  expansion  of  the  systems  engineering  concepts  to  integrate  the  navigation 
and  flight  control  systems.  These  two  systems  will  be  cross  fed  with  each  other  through  redundant  digital  data  buses,  such  as 
the  one  defined  by  MIL-STD-1 553.  The  Low  Life  Cycle  Cost  Avionics  System  (LLCCAS)  program,  currently  being  conducted 
by  Boeing  for  the  Air  Force,  is  considering  such  a concept  for  the  B-52G/H  aircraft.  Figure  7 illustrates  a candidate  LLCCAS 
architecture.  The  redundancy  management  architecture  shown  is  a dual  box  configuration  employing  in-line  monitoring. 


Figure  7.  Low  Life  Cycle  Cost  Avionics  Architecture 


4.1  Trends  in  Sensor  Redundancy  Management 

Future  sensor  redundancy  data  management  techniques  will  permit  the  reduction  of  sensor  unit  weight,  power  and  volume 
by  skewing  the  input  axes  of  multiple  sensors  with  respect  to  each  other  The  approach  will  be  to  apply  these  skewed  sensor 
concepts  to  reduce  the  number  of  rate  gyros  required  for  a reliable  and  survivable  fiy-by-wire  system  (Reference  16)  Rate 
gyros  have  been  one  of  the  higher  failure  rate  items  for  contemporary  systems 

In  system  concepts  which  include  a general-purpose  digital  processor,  the  use  of  skewed  sensor  arrays  may  provide  a 
significant  redundancy  management  advantage  (Reference  15)  A digital  processor  is  nearly  mandatory  because  of  the 
difficulty  in  accurately  converting  the  skewed  sensor  data  to  the  required  orthogonal  set  lor  aircraft  control  in  an  analog 
computation  implementation.  Because  of  the  great  advantages  of  reducing  the  total  sensor  count  in  redundant  systems, 
skewed  sensor  arrays  should  be  included  in  the  redundancy  management  trade  studies 

Conventional  flight  control,  attitude  reference,  and  inertial  systems  have  normally  used  orthogonal  triads  of  gyros  and 
accelerometers  to  obtain  three-axis  rate,  attitude  and/or  velocity  information  Redundant  systems  have  been  mechanized 
simply  by  duplicating  the  triads  as  necessary 

The  skewed  redundant  strapped-down  array  is  an  efficient  means  for  increasing  reliability  The  desired  reliability  level 
dictates  the  number  of  sensors  which  must  be  used  in  a system  The  dual  (or  triple)  redundancy  of  a five-  (or  six-)  sensor  array 
may  be  necessary  to  achieve  the  prescribed  reliability  level  Because  the  effective  redundancy  of  dual  or  triple  orthogonal 
sets  may  be  achieved  with  pentad  or  hexad  arrays  which  require  fewer  sensors,  the  overall  system  reliability  is  improved  by 
the  deletion  of  these  relatively  less  reliable  devices. 
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4.2  Trends  in  Redundancy  Management  Within  the  Computer  Mainframe 

Redundancy  management  techniques  use  two  general  categories  of  failure  detection,  self-test  and  comparison  test  The 
comparison  tests,  first  used  in  redundant  analog  systems,  will  continue  to  be  used  in  the  future  because  of  their  proven  track 
record 

The  two  basic  approaches  to  comparison  testing  are  (Reference  16): 

• Cross-SSO  (Signal  Selection  Device),  which  uses  the  output  of  an  SSD  as  the  good  reference  signal  against 
which  all  other  channels  are  compared. 

• Cross-Channel,  which  compares  each  channel  with  the  other  channels  and  operates  independently  of  any  SSDs 
in  the  signal  chain. 

Typical  cross-SSD  and  cross-channel  monitors  are  shown  in  Figure  8.  as  they  might  appear  in  Channel  1 of  a quadruplex 
configuration 

Future  highly  reliable  flight  control  systems  will  improve  system  reliability  by  providing  cross  strapping  of  input  signals.  The 
SSD  can,  in  principle,  provide  this  function  without  monitoring  the  input  signals;  i.e.,  without  removing  a failed  signal 

By  detecting  and  removing  a failed  signal,  failure  detection  can  improve  the  benefits  of  an  SSD  relative  to  its  cross 
strapping  function.  As  an  example,  without  failure  detection  the  output  of  an  SSD  with  four  inputs  will  fail  after  two  inputs  have 
failed.  With  failure  detection  the  output  could  conceivably  fail  only  after  four  of  the  inputs  have  failed.  There  is  an  obvious 
trade  off  between  the  probability  of  two  failures  versus  the  probability  of  those  combinations  of  detected  and  undetected 
failures  and  nuisance  alarms  which  will  cause  disengagement  of  the  device. 

The  second  general  category  of  failure  detection  is  self-test  and  the  related  function  of  built-in  test  (BIT).  Self-test 
architectures  are  currently  in  a constant  state  of  revision  and  upgrading  to  improve  efficiency  and  coverage. 
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Figure  8.  Comparison  Monitoring  Techniques 
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Self-testing  of  digital  computers  involves  a mix  of  hardware  and  software.  Certain  basic  portions  of  the  computer  must  be 
operable  before  any  self-testing  can  be  conducted,  e g.,  power  supplies  and  clocks.  Failure  of  these  basic  portions  must  be 
detected  by  hardware. 

With  these  basic  portions  of  the  computer  operating,  self-testing  of  the  computer  can  begin.  The  design  of  the  self-test 
program  is  based  on  the  inverted  pyramid  test  philosophy.  That  is,  the  program  first  tests  the  instructions  that  require  a 
minimum  of  logic  for  their  execution,  and  the  memory  locations  that  contain  the  self-test  program.  These  verified  instructions 
and  memory  locations  are  then  used  to  test  instructions  and  memory  on  the  next  higher  level.  This  process  is  continued  until 
all  of  the  instructions,  memory,  and  I/O  have  been  verified. 

The  use  of  parity  to  monitor  memory  storage  is  becoming  less  popular  among  digital  flight  control  system  suppliers.  It  is 
doubtful  that  this  use  of  parity  will  be  required  as  part  of  redundancy  management  of  future  systems.  The  added  expense  and 
complexity  for  increased  word  length  does  not  appear  to  be  cost  effective  for  future  digital  flight  control  systems. 

Another  area  that  impacts  the  digital  redundancy  management  architecture  is  the  question  of  synchronous  or 
asynchronous  operation.  The  trend  appears  to  be  moving  toward  asynchronous  operation  using  equalization  techniques  to 
prevent  numerical  integration  drift. 

4.3  Trends  in  Actuation  Redundancy  Management  Management 

This  appears  to  be  the  least  worked  redundancy  management  area  in  terms  of  new  architectures.  The  varied  levels  of 
actuation  redundancy  management  have  proven  reasonably  successful.  Component  reliability  still  remains  as  the  weakest 
link  and  probably  will  continue  to  be  in  the  near  future. 

Because  of  the  fly-by-wire  evolution , the  state  of  the  art  is  improving  for  redundant  electrohydraulic  servoactuators.  Four 
basic  categories  have  been  identified  for  the  purposes  of  classifying  redundant  actuators  (Reference  19).  All  categories 
apply  to  multiple  channel,  electrically  commanded,  closed  loop,  position  control  servos.  Redundancy  is  assumed  at  the 
electrohydraulic  interface  but  not  necessarily  for  the  power  actuator.  The  four  categories  define  techniques  for  combining  the 
outputs  of  redundant  servovalves  and  servoactuators. 

The  first  of  the  tour  categories  identifies  the  general  summing  technique:  averaging,  active  standby  or  voting.  The  second 
category  defines  the  mechanical  variable  which  is  summed:  position,  velocity,  or  force.  A third  category  classifies  multistage 
servos  by  the  type  of  power  stage  command:  position  commanded  or  velocity  commanded.  The  last  category  classifies 
techniques  for  integrating  dissimilar  backup  control  channels:  series  or  parallel. 

High  reliability  requirements  will  continue  to  be  met  by  the  multicylinder  hydraulic  actuator.  Two  successful  configurations 
are  the  tandem  cylinder  and  the  multiple  single  cylinders  arranged  side-by-side  along  the  control  surface  hinge  line  The 
tandem  configuration  has  been  built  in  dual  and  triple  designs.  Side-by-side  configurations  have  been  built  as  well  as 
combinations  of  tandem  and  side-by-side  applications  to  achieve  "dual-dual  ' designs. 

Further  flexibility  in  overall  actuation  system  concepts  is  afforded  by  the  use  of  "split  surfaces"  — combinations  of  parallel, 
independent  surfaces  operated  by  individual  (or  multicylinder)  actuators.  The  variations  of  actuator  configurations  and 
control  surface  arrangements  are  virtually  limitless. 

Actuation  redundancy  management  is  still  the  weakest  reliability  link  in  our  future  highly  reliable  flight  controls  systems. 
The  next  few  years  will  see  some  improvements  but  not  as  dramatic  as  the  computer  architecture  improvements. 
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Suimary 

In  this  paper  we  survey  a number  of  methods  for  the  detection  of  abrupt  changes  (such  as  failures) 
in  stochastic  dynamical  systems.  We  concentrate  on  the  class  of  linear  systems,  but  the  basic  concepts, 
if  not  the  detailed  analyses,  carry  over  to  other  classes  of  systems.  The  methods  surveyed  range  from 
the  design  of  specific  failure-sensitive  filters,  to  the  use  of  statistical  tests  on  filter  innovations, 
to  the  development  of  jump  process  formulations.  Tradeoffs  in  complexity  versus  performance  are 
discussed. 


I.  Introduction 

With  the  increasing  availability  and  decreasing  cost  of  digital  hardware  and  software,  there  has  de- 
veloped a desire  in  several  disciplines  for  the  development  of  sophisticated  digital  system  design  tech- 
niques that  can  greatly  in^jrove  overall  system  performance.  A good  example  of  this  can  be  found  in  the 
field  of  digital  aircraft  control  (see,  for  example,  Doolin  145),  Taylor  146),  and  Meyer  and  Cicolani 
[47]),  where  a great  deal  of  effort  is  being  put  into  the  design  of  aircraft  with  reduced  static  stabi- 
lity, flexible  wings,  etc.  Such  vehicles  can  provide  improved  performance  in  terms  of  drag  reduction 
and  decreased  fuel  consumption,  but  they  also  require  sophisticated  control  systems  to  deal  with  problems 
such  as  active  control  of  unstable  aircraft,  suppression  of  flutter,  the  detection  of  system  failures, 
and  management  of  system  redundancy.  The  demands  on  such  a control  system  are  beyond  the  capabilities  of 
conventional  aircraft  control  system  design  techniques,  and  the  use  of  digital  techniques  is  essential. 

Another  example  can  be  found  in  the  field  of  electrocardiography.  In  recent  years  a great  deal  of 
effort  has  been  devoted  to  the  development  of  digital  techniques  for  the  automatic  diagnosis  of  electro- 
cardiograms (ECG's;  see,  for  example,  (47)),  Such  systems  ceui  be  for  preliminary  screening  of  large  num- 
bers egg's, for  the  monitoring  of  patients  in  a hospital,  etc. 

In  this  paper  we  review  some  of  the  recent  wor)c  in  one  area  of  system  theory  that  is  of  in^rtance 
in  both  of  these  exaunples,  as  well  as  in  many  other  system  design  problems.  Specifically,  we  will  dis- 
cuss the  problem  of  the  detection  of  abrupt  changes  in  dynamical  systems.  In  the  aircraft  control  pro- 
blem one  is  concerned  with  the  detection  of  actuator  and  sensor  failures,  while  in  the  EGG  analysis  pro- 
blem one  wants  to  detect  arrhythmias  — sudden  changes  in  the  rhythm  of  the  heart.  For  the  sake  of  sim- 
plicity in  our  discussion,  we  will  refer  to  all  such  eibrupt  changes  as  •’failures",  although,  as  in  the 
EGG  excunple,  the  abrupt  change  need  not  be  a physical  failure.  Our  aim  in  this  survey  is  to  provide  an 
overview  of  a number  of  the  basic  concepts  in  failure  detection.  The  problem  of  system  reorganization 
subsequent  to  the  detection  of  a failure  is  considered  in  several  of  the  references.  We  will  point  out 
these  references  in  the  sequel,  but  we  will  concentrate  primarily  on  the  detection  problem. 

The  design  of  failure  detection  systems  involves  the  consideration  of  several  issues.  One  is  usually 
interested  in  designing  a system  that  will  respond  rapidly  when  a failure  occurs;  however,  in  high  per- 
formance systems  one  often  ccmnot  tolerate  significant  degradation  in  performance  during  normal  system 
operation.  These  two  consideration  are  usually  in  conflict.  That  is,  a system  that  is  designed  to  res- 
pond quic)cly  to  certain  abrupt  changes  must  necessarily  be  sensitive  to  certain  high  frequency  effects, 
and  this  in  turn  will  tend  to  increase  the  sensitivity  of  the  system  to  noise  (via  the  occurrence  of 
false  alarms  signaled  by  the  failure  detection  system).  The  tradeoff  between  these  design  issues  is  best 
studied  in  the  context  of  a specific  exan^le  in  which  the  costs  of  the  veurious  tradeoffs  can  be  assessed. 
For  example,  one  might  be  more  willing  to  tolerate  false  alarms  in  a highly  redundant  system  configuration 
than  in  a system  without  substantial  bac)c-up  capabilities. 

In  general,  one  would  li)ce  to  design  a failure  detection  system  that  ta)ces  system  redundancy  into 
account.  For  example,  in  a system  containing  several  bac)c-up  subsystems  we  may  be  able  to  devise  a sim- 
ple detection  algorithm  that  is  easily  implemented  but  yields  only  moderate  false  alarm  rates.  On  the 
other  hand,  by  implementing  a more  complex  failure  detection  algorithm  that  tedees  careful  account  of  sys- 
tem dynamics,  one  may  be  able  to  reduce  requirements  for  costly  hardware  redundancy. 

In  addition  to  ta)cing  hardware  issues  into  consideration,  the  designer  of  failure  detection  systems 
should  consider  the  issue  of  computational  complexity.  One  clearly  needs  a scheme  that  has  reasonable 
storage  and  time  requirements.  It  would  also  be  useful  to  have  a design  methodology  that  admits  a range 
of  implementations,  allowing  a tradeoff  study  of  system  complexity  vs,  performance.  In  addition,  it 
would  be  desirable  to  have  a design  that  ta)ces  advantage  of  new  computer  capabilities  and  structures 
(e.g.  designs  that  are  amenable  to  modular  or  parallel  implementations). 

In  this  paper  we  survey  a variety  of  failure  detection  methods,  and,  )ceeping  the  issues  mentioned 
above  in  mind,  we  will  coiment  on  the  characteristics,  advantages,  disadvantages,  and  tradeoffs  involved 
in  the  various  techniques.  In  order  to  provide  this  survey  with  some  organization  eind  to  point  out  some 
of  the  key  concepts  in  failure  detection  system  design,  we  have  defined  several  categories  of  failure 
detection  systems  and  have  placed  the  designs  we  have  collected  into  these  groups.  Clearly  such  a 
grouping  can  only  be  a rough  approximation,  and  we  caution  the  reader  against  drawing  too  much  of  an 
inference  about  individual  designs  based  on  our  classification  of  them. (several  of  the  techniques  could 
easily  fall  into  a number  of  our  classes).  In  addition,  for  the  sake  of  brevity  we  have  limited  our 
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detailed  discussions  to  only  a few  of  the  many  techniques.  Our  choice  of  those  techniques  has  been  moti- 
vated by  a desire  to  spem  the  range  of  availeible  methods  and  by  our  familiarity  with  certain  of  these 
algorithms.  Finally,  we  have  attempted  to  collect  all  of  those  studies  of  the  failure  detection  problem 
of  which  we  are  aware,  and  we  apologize  for  any  oversights, 

II.  Formulations  of  the  Failure  Detection  Problem 

In  this  paper  we  are  mostly  concerned  with  the  analysis  of  linear  stochastic  models  In  the  standard 
state  space  form 

System  Dynamic 

x(k+l)  = $(k)x(k)  + B(k)u(k)  + w(k)  (1) 

Sensor  Equation 

z(k)  = H(k)x(k)  + J(k)u(k)  + v(k)  (2) 

where  u is  aknown  input,  and  w and  v are  zero-mean,  independent,  white  Gaussian  sequences  with  covariances 
defined  by 


E[w(k)w'(j)l  = Q6.  . , E[v(k)v'(j)l  = R«  . 

K]  K3 


(3) 


where  6 . is  the  Kronecker  delta.  We  think  of  (l)-(3)  as  describing  the  "normal  operation"  or  "no  failure" 
kl 

model  of  the  system  of  interest.  If  no  failures  occur,  the  optimal  state  estimator  is  given  by  the  dis- 
crete Kalman  filter  equations  [331 


x(k+l|k)  = $(k)x(k|k)  + B(k)u(k) 
J(k|k)  = x(k|k-l)  + K(k)Y(k) 

-C(k)  = z(k)  - H(k)Q(k|k-l)  - J(k)u(k) 


(4) 

(5) 

(6) 


Where  Y is  the  zero-mean,  Gaussian  innovation  process,  and  the  gain  K is  calculated  from  the  equations 


P(k+l|k)  = $(k)P(k|k)$' (k)  + Q 

n> 

V(k)  = H(k)P(k|k-l)H' (k)  + R 

(8) 

1 -1 

K(k)  = P(k|k-1)H'  (k)V  (k) 

(9) 

P(k|k)  = P(klk-l)  - K(k)H(k)P(klk-l) 

110) 

Here  P(i|j)  is  the  estimation  error  covariance  of  the  estimate  x(i'iK  and  V{k)  is  the  covariance  of 
Y(k)  is  the  covariance  of  y(k).  We  refer  to  (4)-(l0)  as  the  "normal  mode  lilter"  in  the  sequel. 

In  addition  to  the  above  estimator,  one  may  also  have  a closed  loop  control  law,  such  as  the  linear 

law 


u(k)  = G(k)x(k  Ik)  (111 

We  then  obtain  the  normal  operation  configuration  depicted  in  Figure  1. 

The  problem  of  failure  detection  is  concerned  with  the  detection  of  abrupt  changes  in  a system,  as 
modeled  by  (l)-(3).  Such  abrupt  changes  can  arise  in  a number  of  ways.  For  example,  in  aerospace  appli- 
cations, one  is  often  concerned  with  the  failure  of  control  actuators  and  surfaces.  Such  abruft  chanues 
can  manifest  themselves  shifts  in  the  control  gain  matrix  B,  increased  process  noise,  or  as  a bias  in 
equation  (1)  (as  might  arise  if  a thruster  developed  a leak  (311),  Tn  addition,  failures  of  sensors  may 
take  the  form  of  abrupt  changes  in  H,  increases  in  measurement  noise,  or  as  biases  in  (2),  For  simpli- 
city, we  will  refer  to  abrupt  changes  in  (1)  as  "actuator  failures",  and  shifts  in  (2)  will  be  called 
"sensor  failures."  Again  we  point  out  that  in  many  applications  stilfts  in  (1)  or  (2)  may  be  used  to  model 
changes  in  observed  system  behavior  that  have  nothing  to  do  with  actuators  or  sensors. 

The  main  task  of  a failure  detection  and  compensation  designs  is  to  modify  the  normal  mode  configu- 
ration In  order  to  include  the  capability  of  detecting  abrupt  changes  and  compensating  foi  them  by  acti- 
vating back-up  systems,  adjusting  the  feedback  design  appropriately,  etc.  Conceptually,  we  think  of  the 
detection-compensation  system  as  part  of  the  filtering  fXJrtion  of  the  feedback  loop.  As  illustrated  in 
Figures  2 and  3,  the  resulting  filter  design  can  take  one  of  two  formo.  Either  we  |>erform  a comf iete 
redesign  of  the  filter,  replacing  (4)-(l0)  with  a filter  that  is  sensitive  to  failures,  or  vie  design  a 
system  that  monitors  the  normal  system  configuration  and  adjusts  the  system  accordingly,  We  will  discuss 
examples  of  both  of  these  structures. 


Figure  3?  Failure  Detection  System  involving  a Monitoring 
System  for  the  No-Failure  Configuration. 
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As  mentioned  earlier,  we  rfill  concentrate  primarily  on  the  problem  of  failure  detection,  which  we 
consider  to  consist  of  three  tasks  — alarm,  isolation,  and  estimation.  The  alarm  task  simply  consists 
of  making  a binary  decision  — either  that  something  has  gone  wrong  or  that  everything  is  fine.  The  pro- 
blem of  isolation  is  that  of  determining  the  source  of  the  failure  — which  sensor  or  actuator  has 

failed,  what  type  of  arrhythmia  has  occurred,  etc.  Finally,  the  estimation  problem  involves  the  deter- 
mination of  the  extent  of  failure.  For  example,  a sensor  may  become  completely  non-operational  (an  **off** 
or  "hard-over**  failure) , or  it  may  simply  suffer  degradation  in  the  form  of  a bias  or  increased  inaccura- 
cies, which  may  be  modeled  as  an  increase  in  the  sensor  noise  covariance.  In  the  latter  case,  estimates 
of  the  bias  or  the  increase  in  noise  may  allow  continued  use  of  the  sensor,  albeit  in  a degraded  mode. 
Clearly  the  extent  to  which  we  need  to  perform  these  various  tasks  depends  upon  the  application.  If  a 
human  operator  is  available,  we  may  only  be  interested  in  generating  an  alarm  that  tells  him  to  perform 
further  tests.  In  other  systems  in  which  back-ups  are  available,  we  might  settle  for  failure  isolation 
without  estimation.  On  the  other  hand,  in  the  absence  of  hardweure  redundancy,  we  may  be  interested  in 
using  a degraded  instrument  and  thus  would  need  estimation  information. 

Intuitively  we  can  associate  increased  software  system  complexity  with  the  tasks  — i.e.,  isolation 
requires  more  sophisticated  data  processing  than  an  alarm,  and  estimation  more  than  isolation.  On  the 
other  side,  as  we  increase  failure  detection  capabilities,  we  may  be  able  to  decrease  hardware  redundancy. 
Also,  in  some  apl>lications  we  may  be  able  to  delay  isolation  and  estimation  until  after  an  alarm  has  been 
sounded.  In  such  a sequential  structure,  one  increases  detector  complexity  after  a failure  has  been  de- 
tected, thereby  reducing  the  computational  burden  during  normal  operation.  Again  the  details  of  such 
considerations  depend  upon  the  particular  application. 

Another  tradeoff  involving  failure  detection  system  complexity  involves  its  relation  to  detection 
system  performance.  For  example,  one  might  expect  that  one  could  achieve  better  alarm  performance  by 
using  a priori  knowledge  concerning  likely  failure  modes.  That  is,  by  looking  for  specific  forrr^s  of  sys- 
tem behavior  that  are  characteristic  of  certain  failures,  one  should  be  able  to  improve  detection  perfor- 
mance. Thus,  it  seems  likely  that  alarm  performance  (as  measured  by  the  tradeoff  between  false  alarms 
and  missed  detections)  will  be  improved  if  we  atten?Dt  simultaneous  detection,  isolation,  and  estimation 
of  failures.  This  tradeoff  of  complexity  vs.  performance  is  extremely  important  in  the  design  of  failure 
detection  systems. 

In  the  following  sections  we  will  discuss  several  failure  detection  methods  and  will  comment  on  their 
characteristics  with  respect  to  the  issues  mentioned  in  this  and  the  preceding  section.  Wc  hav^e  not  f.ro- 
vided  a general  set  of  failure  models  to  be  considered,  as  the  various  techniques  are  based  on  quite  dif- 
ferent failure  models.  These  will  be  described  as  we  discuss  the  various  methodologies. 

III.  "Failure-Sensitive"  Filters 

Our  first  class  of  failure  detection  concepts  is  aimed  at  overcoming  the  problem  of  an  "oblivious 
filter".  As  has  been  noted  by  many  authors  Cll-[3J,  (33),  the  optimal  filter  defined  by  (4) -(10)  per- 
forms well  if  there  are  no  modelling  errors;  however,  it  is  ^xjssible  for  the  filter  estimate  to  diverge 
if  there  are  substantial  unmodeled  phenomena.  The  problem  occurs  because  the  filter  "learns  the  state 
too  well"  — i.e.  the  precomputed  error  covariance  P and  filter  gain  K become  small,  and  the  filter  re- 
lies on  old  measurements  for  its  estimates  and  is  oblivious  to  new  measurements.  Thus,  if  an  abrupt 
ch^mge  occurs,  the  filter  will  respond  quite  sluggishly,  yielding  poor  performance.  Consequently,  one 
would  like  to  devise  filter  designs  that  remain  sensitive  to  new  data  so  that  abrupt  changf*^  will  :>e 
reflected  in  the  filter  behavior. 


Two  well-known  techniques  for  keeping  the  filter  sensitive  to  new  data  are  th?-  exponentially  ige- 
weighted  filter  studied  Fagin  [1]  and  Tarn  and  Zaborszky  12]  and  the  limited  memory  filter  projx>si.d  by 
Jazwinski  [3],  Others,  such  as  increasing  noise  covariances  or  simply  fixing  the  filter  gam  ar»>  dis- 
cussed by  Jazwinski  in  [33].  These  techniques  yield  only  indirect  failure  information.  That  is,  if  ar. 
abrupt  change  occurs,  these  filters  will  respond  faster  than  the  normal  filter,  and  one  can  ba?e  a fai- 
lure detection  decision  on  sudden  changes  of  x. 

It  is  important  to  note  a performance  tradeoff  evident  in  this  method.  As  we  increase  our  sensiti- 
vity to  new  data,  (by  effectively  increasing  the  b^mdwidth  of  the  Kalman  filter),  our  system 
more  sensitive  to  sensor  noise,  and  the  performance  of  t)ie  filter  under  no-faiLure  ccnditior.-  degrades. 

In  some  cases  this  can  be  rather  severe,  and  one  may  not  be  able  to  tolerate  the  degradation  ir.  <'veral' 
system  performance  under  no-failure  conditions.  One  might  then  consider  a two  filter  rysten  --  the  nor- 
mal mode  filter  (4)-(10)  as  the  primary  filter,  with  this  type  of  failure-sensitive  filter  as  an  auxiliam- 
monitor,  used  only  to  detect  abrupt  changes.  We  remark  that  the  tradeoff  between  detection  formance 
and  filter  behavior  under  normal  conditions  is  a characteristic  of  all  failure  detection  systems  and  is 
analogous  to  the  costs  associated  with  false  alarms  and  missed  detection  in  standard  detection  pr-^blcns 
[411. 

The  techniques  mentioned  so  far  in  this  section  are  rather  indirect  failure  detection  approaches. 
Several  methods  have  been  developed  for  the  design  of  filters  that  are  sensitive  to  sj>ecific  failures. 

One  method  involves  the  inclusion  of  several  "failure  states"  in  the  dynamic  model  (l)-(3).  Kerr  (25] 
has  considered  a procedure  In  which  failure  nodes,  such  as  the  onset  of  biases,  are  included  as  state 
variables.  If  the  estimates  of  these  variables  vary  markedly  from  their  nominal  values,  a failure  is 
declared.  A t%#o-conf idence  interval  overlap  decision  rule  for  failure  detection  usina  such  failure 
state  is  described  and  its  performance  is  analyzed  in  (25],  Note  that  this  approach  provides  failure 
isolation  and  estimation  at  the  expense  of  Increased  dimensionality  and  some  performance  degradation  undei’ 
no-failure  conditions  (inclusion  of  the  added  states  effectively  opens  up  the  bandwidth  of  the  Kalman 
filter) . 


An  alternative  to  the  addition  of  failure  states  to  the  dynamic  model  is  the  class  of  detector  fil- 
ters develop«»d  by  Beard  (41  ard  Jones  (5).  Their  work  has  led  to  a systematic  design  procedure  for  the 
detection  of  a wide  variety  of  abrupt  changes  in  linear  time-invariant  systems.  They  consider  the  con- 
tinuous-time, time-invariant,  deterministic  system  model 
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x(t)  » Ax(t)  + Bu(t)  (11) 

z(t)  = Cx(t)  (12) 


and  design  a filter  of  the  form 

^x(t)  - Ax(t)  + D(z(t)-Cx(t) ) + Bu(t)  (13) 

The  primary  criterion  in  the  choice  of  the  gain  matrix  D is  not  that  (13)  provide  a good  estimate  x 
(as  it  is  with  observers  or  optimal  estimators),  but  rather  that  the  effects  of  certain  failures  are  ac- 
centuated in  the  filter  residual 


[ 

I 


I 

t 


Y(t)  - z(t)  - Cx(t)  (14) 

The  basic  idea  is  to  choose  D so  that  particular  failure  modes  manifest  themselves  as  re^. duals  which 

remain  in  a fixed  direction  or  in  a fixed  plane. 

To  illustrate  the  Beard-Jones  approach,  let  us  consider  a simple  exeunple  from  [4],  Suppose  we  wish 

to  detect  a failure  of  the  ith  actuator  (i.e.  in  the  actuator  driven  by  the  ith  component  of  u) . If  we 

assume  the  failure  ta)ces  the  form  of  a constant  bias,  our  state  equation  becomes 


x(t)  s Ax  (t)  + B(u(t)  + ve.l 


Ax(t)  + Bu(t)  + vb^, 


(15) 


where  e^  is  the  ith  standard  basis  vectoii  is  the  ith  column  of  B,  and  t^  is  the  (un)cnown)  time  of 

failure.  Suppose  we  consider  the  case  of  full  state  measurement  — i.e.,  let  C*I.  In  this  case  we 
obtain  a differential  equation  for  the  residual 


Y(t)  * (A-D)Y(t)  + b. 

X 

If  we  choose  D=ai  + A,  we  obtain 


(16) 


Y(t)  * -OY(t)  + Vb. 


Y(t)  = e ^ Y(tg)  + L 

a 


b. 


(17) 


Thus,  as  the  effect  of  the  initial  condition  dies  out,  Y(t)  maintains  a fixed  direction  (b^)  with  mag- 
nitude proportional  to  failure  size(v).  Note  that  as  we  increase  a (thus  increasing  filter  gain),  the 
initial  condition  dies  out  faster,  but  the  magnitude  of  steady-state  value  of  Y decreases.  Thus,  if 
there  is  any  noise  in  the  system,  we  cannot  ma)ce  o arbitrarily  large. 

In  their  wor)c  Beard  and  Jones  consider  the  design  of  such  filters  for  an  extremely  wide  variety  of 
failure  inodes,  including  actuator  and  sensor  shifts  and  shifts  in  A and  B.  The  initial  deterministic 
analysis  for  all  of  these  cases  was  considered  by  Beard  (4) , while  a systematic  design  procedure  is 
given  by  Jones  (5)  for  the  design  of  the  gain  D to  allow  detection  of  several  failures  modes.  Jones* 
approach  is  quite  geometric  in  nature,  and  his  formulation  allows  one  to  gain  considerable  insight  into 
the  detection  problem.  As  pointed  out  in  (5] , the  gain  selection  problem  is  quite  similar  to  the  output 
decoupling  problem  and  requires  the  introduction  of  the  important  concept  of  "mutually  detectable 
failure  modes"  in  order  to  answer  the  question  of  whether  or  not  one  can  simultaneously  distinguish  be- 
tween several  types  of  failures.  Thus  the  question  of  failure  isolation  is  of  central  importance  in  the 
design  methodology  derived  in  (5) . 

The  results  in  (4) , [5J  represent  perhaps  the  most  thorough  study  of  the  basic  concepts  underlying 
failure  detection.  The  tradeoff  between  detection  and  filter  performance  is  discussed  in  depth  in  (5) 
and  an  attempt  is  made  in  14]  to  introduce  the  concept  of  the  level  of  redundancy  in  a dynamical  system. 

As  mentioned  in  the  example,  the  basic  design  procedure  is  deterministic.  However,  in  this  simple 
example  we  can  see  how  one  can  ta)(e  noise  into  account.  If  the  system  (11),  (12)  contains  noise,  we  have 
seen  that  one  may  not  wish  to  ma)ce  the  scalar  O as  large  as  possible.  In  fact,  one  could  choose  O so  as 
to  minimize  the  ..lean-square  estimation  error  in  the  detector  filter  when  there  is  no  failure.  In  his 
thesis  (5),  Jones  describes  a procedure  in  which  one  first  chooses  the  structure  of  D for  failure  de- 
tection purposes  and  then  chooses  the  remaining  free  parameters  in  order  to  minimize  the  estimation 
error  covariance.  Although  this  yields  a suboptimal  filter  design,  it  may  work  quite  well,  as  it  did  in 
the  problem  reported  in  (5J . 

In  summary,  the  Jones-Beard  design  methodology  is  extremely  useful  conceptually,  can  be  used  to  de- 
tect a wide  variety  of  failures,  and  provides  detailed  failure  isolation  information.  It  is  suboptimal 
as  an  estimator,  and  if  this  presents  a serious  problem,  one  might  wish  to  use  the  detector  filter  as  an 
auxiliary  monitoring  system.  This  appears  to  be  only  a minor  drawback,  and  the  major  limitation  of  the 
approach  is  its  applicability  only  to  time-invariant  systems. 
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IV.  Voting  Systems 

Voting  techniques  are  often  useful  in  systems  that  possess  a high  degree  of  parallel  hardware  re- 
dundancy. Memoryless  voting  methods  can  work  quite  well  for  the  detection  of  "hard"  or  large  failures, 
and  the  papers  of  Gilmore  and  McKern  IS],  Pejsa  (7),  and  Ephgrave  (8]  discuss  the  successful  application 
of  voting  techniques  to  the  detection  of  hard  gyro  failures  in  inertial  navigation  systems. 

In  standard  voting  schemes,  one  has  (at  least)  three  identical  instruments.  Simple  logic  is  then 
developed  to  detect  failures  and  eliminate  faulty  instruments,  for  exeunple,  if  one  of  the  three  redun- 
dant signals  differs  markedly  from  the  other  two,  the  differing  signal  is  eliminated.  Recently,  Broen 
(91  has  developed  a class  of  voter-estimators  that  possesses  advantages  relative  to  standard  voting 
techniques.  Consider  the  dynamical  system 


x(k+l)  * <l>x(k) 


(18) 


with  a triply  redundant  set  of  sensors 


y^(k)  = Hj^x(k)  + v^(k) 


y^Ck)  * H2X(k)  + 


(19) 


y^(k)  = H^x(k)  + v^(k) 


Broen  develops  a set  of  recursive  filter  equations  for  computing  the  estimate  x(k)  that  minimizes 
k 3 


lo  li 


(20) 


where  is  the  covariance  of  the  measurement  noise  v., 


and  is  the  inno-'ations  sequence 


Y.  (i) 


(i)  - ''x(k) 


(21) 


Here  w..  is  a functions  of  y (i) , y,(i),  y,(i)  which  is  large  if  y . (i)  is  close  to  the  other  two  y (i) 

^1  I £ J j re 

and  is  small  if  y^(i)  deviates  greatly  from  the  other  two.  In  this  way,  one  obtains  a "soft"  voting 

procedure  in  which  faulty  sensors  are  smoothly  removed  from  consideration.  This  greatly  alleviates  the 
cost  of  false  alarms,  but  the  price  is  the  on-line  ccxnputation  of  the  filter  gain  (which  is  a function 
of  the  Note  that  in  equation  (19),  Broen  appears  to  allow  the  y^  to  be  physically  different 

sensors  (different  H^*s),  but  the  analysis  of  his  paper  makes  it  clear  that  he  requires  identical 
sensors  — i.e.  H^=H2=H^. 

Voting  schemes  are  in  general  relatively  easy  to  implement  and  usually  provide  fast  detection  of 
hard  failures,  but  they  are  only  applicable  in  systems  possessing  a high  level  of  parallel  redundancy. 
They  do  not  in  general  t£dce  advantage  of  redundemt  information  provid3d  by  unlike  sensors,  and  thus 
cannot  detect  failures  in  single  or  even  doubly  redundant  sensors.  In  addition,  voting  techniques  can 
have  difficulties  in  detecting  "soft"  failures  (such  as  a small  bias  shift) . 

V,  Multiple  Hypothesis  Filter-Detectors 

A rather  large  class  of  adaptive  estimation  and  failure  detection  schemes  involves  the  use  of  a 
"bank"  of  linear  filters  based  on  different  hypotheses  concerning  the  underlying  system  behavior.  In 
the  work  of  Athens  and  Willner  (10)  and  Lainiotis  (11) , several  different  sets  of  system  matrices  are 
hypothesized.  Filters  for  each  of  the  models  are  constructed,  and  the  innovations  from  the  various  fil- 
ters are  used  to  c«npute  the  conditional  probed^llity  that  each  system  model  is  the  correct  one.  In  this 
manner,  one  can  do  simultaneous  system  identification  and  state  estimation.  In  addition,  an  abrupt 
change  In  the  probabilities  can  be  used  to  detect  changes  in  true  system  behavior.  This  technique  has 
been  investigated  in  the  context  of  the  adaptive  control  of  the  F-8c  digital  fly-by-vire  aircraft  by 
Athans,  Dunn,  Greene,  et.al,,  [35]  and  also  has  been  applied  to  the  problem  of  classifying  rhythms  and 
detecting  rhythm  shifts  in  electrocardiograms.  Extremely  good  results  in  the  latter  case  are  reported 
by  Gustafson,  Willsky,  and  Wang  in  [36] . 


Techniques  involving  multiple  hypotheses  have  also  been  used  to  design  failure  detection  systems, 
Montgomery,  Caglayan,  and  Price,  [12],  [13]  have  used  such  a technique  for  digital  flight  control  sys- 
tems and  have  studied  its  robustness  in  the  presence  of  nonlinearities  via  simulations.  Recently  a 
technique  involving  a bank  of  observers  has  been  devised  [34],  and  a successful  apolication  to  a hydro- 
foil sensor  failure  problem  in  reported  by  Clark,  Fosth,  and  Walton  In  [34],  Also,  Willsky,  Deyst,  and 
Crawford  [15], [16]  have  applied  the  methodology  devised  by  Buxbaum  and  Haddad  in  [14]  to  study  failure 
detection  for  an  inertail  navigation  problem.  We  will  briefly  describe  this  technique  to  illustrate 
some  of  the  concepts  underlying  the  back  of  filters  approach.  We  also  refer  the  reader  to  Wernersson 
(421  for  a technique  that  is  similar  to  that  discussed  in  (16), 


Consider  the  system 
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x(k+l)  - 'Kk)x(k)  + w(k)  (22) 

2(k)  - H(k)x(k)  + v(k)  (23) 

We  are  interested  in  detecting  sudden  shifts  in  certain  of  the  coaQxsnents  of  x bias  states).  We 

model  these  shifts  by  choosing  the  distribution  of  w appropriately.  Let  {f^,,,,,f^}  be  the  set  of 

of  hypothesized  failure  directions.  We  then  assume  that  w has  a high  probability  of  being  the  usual 
process  noise  and  a small  probability  of  including  a burst  of  noise  in  each  of  the  failure  directions. 
Thus  the  density  for  w(k)  is 


PqN(0,Q)  + 


s 


i-l 


P^N(0,o+a^f^fp 


(24) 


i-0 


p.-l. 


i=l, . . . ,r 


(25) 


Here  N(m,p)  is  a normal  density  with  mean  m and  covariance  P. 

If  we  hypothesize  such  a density  at  each  point  in  time  and  if  we  assume  that  x(0)  is  normally  dis- 
tributed, we  have  the  following  expression  for  the  conditional  density  of  x(k)  given  z (1) , . , . ,z (k) : 


p(x,k)  = ^ .. 

Here  ^ = (i^, . . . , i^^_^)  and 

random  k-  tuple  where  j * 
s 

no  shift) • Then 


^ p.N(n^.p^)  (26) 

^k-i 

the  density  has  the  following  interpretation.  Let  ^)  be  a 

i if  there  is  a shift  in  the  f,  direction  at  time  s (i«0  is  used  to  denote 

1 


P^  “ Pr z(l) , . . . ,z(k) ) 


(27) 


and  and  are  the  mean  and  covariance  of  the  Kalman  filter  designed  assuming  j-i  (i.e.  assuming 

w(s)  has  covariance  Q+C.  f.  f ! ) , The  ^ can  be  computed  in  a sequential  manner  as  a function  of  the 
ill 
s s s 

various  filter  innovations.  We  refer  the  reader  to  (14) -[16]  for  the  details  of  the  calculations. 


Note  that  the  implementation  of  (26)  requires  an  exponentially  growing  bank  of  filters  (there  are 

(r+1)  terms  in  (26)),  To  avoid  this  problem  a number  of  approximation  techniques  have  been  proposed 
(14)-tl6),  The  one  used  in  (16)  involves  hypothesizing  shifts  only  once  every  N steps.  At  the  end 
of  each  N step  period  we  "fuse”  the  (r+1)  densities  into  a single  density  emd  begin  the  provedure  again. 
In  this  way  we  implement  only  (r+1)  filters  at  any  time.  We  note  that  the  techniques  devised  in  (10)- 
(12)  do  not  involve  growing  banks  of  filters  (as  the  number  of  hypothesized  models  do  not  grow  in  time) • 
However,  it  is  possible  for  all  of  the  filters  in  the  bank  to  become  oblivious,  and  thus  shifts  between 
the  hypotheses  may  go  undetecced  (see  (16) , (36)  for  exanples) . The  technique  of  periodic  fusing  of  the 
densities  and  initiation  of  new  bank  effectively  avoids  this  problem  (as  would  designing  the  original 
bank  using  age-weighted  filtering  techniques) . 

The  technique  described  above  was  applied  to  the  problem  of  detecting  gyro  and  accelerometer  bias 
shifts  in  a time-varying  inertial  calibration  and  alignment  system.  The  results  of  these  tests  are 
extremely  impressive.  This  is  not  surprising,  as  the  multiple  hypothesis  method  computes  precisely  the 
quantities  of  Interest—  the  probedailities  of  all  types  of  failures  under  consideration.  The  cost  as- 
sociated with  such  a h^gh  level  of  performance  is  an  extremely  complex  fallute  detection  system.  Note, 
however,  that  the  pa^-allel  structure  of  the  system  allows  one  to  consider  highly  efficient  parallel  pro- 
cessing computer  implementations.  In  addition,  the  use  of  reduced-order  filters  for  the  various  failure 
hypotheses  may  Increase  the  practicality  of  such  a scheme,  or  one  might  consider  the  use  of  simpler 
detection-only  system  to  detect  failures,  with  a switch  to  a multiple  hypothesis  procedure  for  failure 
Isolation  and  estimation  after  a failure  has  been  detected. 


However,  even  if  such  a failure  detection  scheme  cannot  be  Implemented  in  a particular  application, 
it  provides  a useful  benchmark  for  comparison  with  simpler  techniques.  In  addition,  by  studying  the 
simulation  of  a multiple  hypothesis  method,  one  can  gain  useful  insight  into  the  dynamics  of  failure 
propagation  and  detection  (see  the  discussion  in  (16)), 

McGarty  (23)  has  developed  a method  for  rejecting  bad  measurements  that  bears  some  similarity  to 
the  approach  just  discussed.  Each  measurement  has  a binary  random  variable  g(k)  associated  with  it.  If 
g(k)*l  the  measurement  is  "good",  (i.e.  the  measurement  contains  the  signal  of  interest),  while  g(k)«0 
denotes  a bad  data  point  (the  measurement  is  pure  noise) • McGarty  devises  a maximum  likelihood  approach 
for  estimating  the  values  of  the  exponentially  growing  set  of  possibilities  (g(i)»l  or  0,  i«l,...,k). 

He  also  allows  these  variables  to  have  a sequential  correlation  (i.e,  knowing  that  the  present  measure- 
ment is  good  or  bad  says  something  about  the  next  observations).  A computationally  feasible  approxima- 
tion method  is  devised  and  simulation  results  are  described.  We  refer  the  reader  to  (23)  for  details. 
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Recently,  Athans,  Whiting,  and  Gruber  [5ll  have  also  considered  the  problem  of  designing  an  estimator 
that  can  detect  and  remove  bad  or  false  measurements.  Their  approach  is  Bayesian  in  nature  — i.e.  an 
estimate  is  generated  of  the  a posteriori  probability  that  a given  measurement  is  false.  The  method  of 
calculation  of  these  pseudo-probabilities  is  quite  similar  to  that  used  in  the  other  multiple  hypothesis 
methods  (see  (10]- [14]).  The  reader  is  referred  to  (51)  for  details  of  the  analysis  and  for  a discussion 
of  some  successful  simulation  results. 


VI.  Jump  Process  Formulations 


The  problem  of  the  detection  of  abrupt  changes  in  dynamical  systems  suggests  the  use  of  jump  process 
techniques  in  devising  system  design  methodologies  (see  (39),  [49] -(50)  for  general  results  on  jump  pro- 
cesses). One  models  potential  failures  as  jumps,  characterized  by  a priori  distributions  which  reflect 
initial  information  concerning  failure  rates.  The  size  of  the  possible  failures  are  usually  taken  to  be 
known.  One  could,  however,  model  failure  magnitude,  as  a random  varicd^le.  This  leads  to  a compound  jump 
process  formulation  which  greatly  complicates  the  desired  analysis.  In  any  event,  taking  such  a jump 
process  formulation,  one  can  devise  failure-sensitive  control  laws  and  methods  for  computing  the  condi- 
tional probed>ility  of  failure.  Control  problems  of  this  type  have  received  a great  deal  of  attention  in 
the  literature.  Sworder,  and  Robinson  (l7J-[20],  {37]  and  Ratner  and  Luenberger  (21)  have  considered  the 
design  of  control  laws  which  take  into  account  the  possibility  of  sudden  shifts  in  system  matrices.  The 
results  they  have  obtained  are  for  the  full-state  feedback  problem  with  no  system  randomness  other  than 
the  jumping  of  the  system  matrices  among  a finite  set  of  possible  matrices. 

Davis  [22]  has  utilized  nonlinear  estimation  techniques  to  solve  a fault  detection  problem.  His  for- 
mulation is  as  follows:  consider  the  scalar  stochastic  equations 

dx(t)  = a(t)x(t)dt  + g(t)dv(t)  (20) 

dy(t)  = h(t)x(t)dt  + dw(t)  (29) 


where  w and  v are  independent  Brownian  motion  processes  and 
a(t)  = a^Ct)  [1-C(t)l  + a^(t)^(t) 

where 


(0  t<T 

CCt)  = { 

‘ 1 t>T 


(30) 


(31) 


and  T is  a random  variable.  Here  we  interpret  a^  as  the  unfailed  dyneunics,  and  a^  represents  the  failure 

RK>de.  Davis  derives  the  optimal,  infinite-dimensional  equations  for  the  computation  of  the  conditional 
mean  of  x and  the  conditional  probability 


C(t|t)  = Pr[t^T|y(s),  0£s<tj  (32) 

An  implementable  approximation  is  described  in  [22] , but  evaluation  of  its  performance  has  not  as  yet 
been  made. 

Note  that  Davis*  method  leads  to  an  estimate  of  x that  is  suboptimal  to  under  no-failure  conditions, 
Chien  (24)  has  devised  a jump  process  formulation  that  avoids  this  difficulty  for  the  problem  of  the 
detection  of  a jximp  or  a ramp  in  a gyro  bias.  He  considers  the  dynamical  model. 

x(t)  - wx(t)  + w(t)  (33) 

where  w is  a white  noise  process.  Three  hypotheses  are  conjectured  for  the  form  of  the  gyro  output 

Normal  Mode 

z(t)  =*  x(t)  + v(t)  Vt  (34) 


Bias  Mode 

z(t)  « x(t)  + mC(t)  + v(t) 


t > T 


(35) 


Ramp  Mode  H^t 

z(t)  « x(t)  + n(t-T)^(t)  + v(t)  t>T  (36) 

where  n and  m are  unknown  constants,  v is  white  noise,  T is  the  time  of  failure,  and  ^(t)  is  an  in  (31). 

Chlen*s  approach  is  as  follows:  design  a filter  based  on  (which  will  thus  yield  the  optimal  es- 

timate for  t<T,  assuming  no  false  alarms  occur),  and  determine  the  steady-state  effect  of  the  degradations 
and  on  the  filter  residuals.  If  one  then  hypothesizes  a failure  rate  q — i.e, 

P(T>t)  - (37) 

and  if  one  further  assumes  a nominal  size  for  the  bias  m,  one  can  then  confute  an  approximate  stochastic 
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differential  equation  for  Pr(Hj^lz(s),  s^t) , in  which  the  input  to  this  equation  is  the  residual  y of  the 
Hq  filter.  The  details  of  the  analysis  are  described  in  [24] . 

For  his  problem  Chien  is  able  to  demonstrate  that  his  detection  procedure  — based  on  the  assumption 
of  a nominal  value  for  the  bias  failure  m — has  the  capability  of  detecting  biases  larger  than  m and  also 
can  be  used  to  detect  rasaps  (mode  H^) . Of  course,  the  delay  times  until  detection  in  these  cases  are 

greater  than  if  one  implemented  a filter  based  on  the  proper  bias  size  or  if  one  were  looking  for  a ramp 
(indicating  the  potential  usefulness  of  estimating  the  failure  magnitude) , The  major  advantages  if 
Chien* s approach  are  the  simplicity  of  the  detector  (implementation  of  a scalar  stochastic  equation)  and 
the  fact  that  one  obtains  an  estimate  of  precisely  the  quantity  of  interest  — the  conditional  probability 
of  failure.  The  simplicity  of  the  scheme  may,  in  fact,  make  it  a great  deal  more  robust  in  the  face  of 
system  modelling  errors  (such  as  the  use  of  cm  extremely  simplified  gyro  error  model)  than  more  sophis- 
ticated approaches.  Also,  this  approach  leads  to  no  degradation  in  performance  prior  to  detection  of 
the  failure.  In  addition,  the  use  of  a probabilistic  description  of  the  time  of  failure  allows  one  to 
avoid  the  problem  of  the  oblivious  filter  --  i.e.  the  fact  that  a failure  can  occur  at  any  time  has  been 
incorporated  in  the  design,  which  therefore  will  reamin  sensitive  to  new  data. 

The  dravdsacks  of  the  scheme  are  the  use  of  a fixed  bias  size  and  the  use  of  the  steady-state  effect 
of  the  failure  on  the  filter  residual.  The  first  of  these  may  not  be  too  much  of  a problem  (as  Chien  has 
pointed  out),  but  the  second  may  cause  difficulties.  Specifically,  this  limits  the  approach  to  time-in- 
variant systems  and  filters.  In  addition,  as  the  treuisient  effect  of  the  failure  has  been  ignored,  it 
may  be  difficult  to  make  quick  detections  of  certain  changes  (i.e.  we  may  have  to  wait  until  the  transient 

dies  out) , In  the  next  section  we  will  discussed  an  approach  (the  GLR  method)  which  has  several  concepts  1 

in  cotnaon  with  Chien*s  approach  2md  which  allows  one  to  overcome  these  two  drawbacks  (at  the  cost  of  j 

added  computational  complexity,  of  course).  I 


In  sunroary,  jump  process  formulations  appear  to  be  quite  natural  for  failure  detection  problems. 

One  usually  makes  approximations  in  the  analysis  in  order  to  obtain  implementable  solutions.  These  sim- 
plifications impose  some  limitations  on  the  capcdDillties  of  the  designs,  but  there  is  at  present  no  sys- 
tematic analytical  procedure  for  evaluating  these  limitations  or  for  studying  tradeoffs  between  design 
complexity  and  system  performance. 

VII.  Innovations-Based  Detection  Systems 

Chien* s failure  detection  technique  can  also  be  placed  in  the  class  of  failure  detection  methods 
that  Involve  the  monitoring  of  the  innovations  of  a filter  based  on  they  hypothesis  of  normal  system 
operation.  In  such  a configuration  the  overall  system  uses  the  normal  filter  until  the  innovations  mo- 
nitoring system  detects  some  form  of  aberrant  behavior.  The  fact  that  the  monitoring  system  can  be 
attached  to  a filter-controller  feedback  system  is  particularly  appealing,  since  overall  system  behavior 
is  not  disturbed  until  after  the  monitor  signals  a failure  and  since  the  monitoring  system  can  be  desig- 
ned to  be  added  to  an  existing  system. 

Mehra  andPeschon  [26]  have  suggested  a number  of  possible  statistical  tests  to  be  performed  on  the 
innovations.  One  of  these  is  a chi-squared  test  which  was  applied  in  [15],  [16]  by  Willsky,  Deyst  and 
Crawford.  Let  Y(k)  be  the  p-dimensional  innovations  for  the  filter  defined  by  (4)-(l0).  If  the  system 
is  operating  normally,  the  innovations  is  zero-mean  and  white  with  known  covariance  V(k) . In  this  case 
the  quantity 

k 

)l(k)  = ^ Y*  (j)v"^(j)Y(j)  (38) 

j*k-N+l 

is  a chi-squared  random  variable  with  Np  degrees  of  freedom  126) , (15) , [16] . If  a system  abnormality  oc- 
curs, the  statistics  of  y change,  and  one  can  consider  a detection  rule  of  the  form 


lUk) 


^ FAILURE 
NO  FAILURE 


(39) 


With  the  aid  of  chi-squared  teibles,  one  can  compute  the  probability  P^.  of  false  alarm  as  a function  of 
the  innovations  window  length  N and  the  decision  threshold  E.  The  probability  of  correct  detection 

depends  upon  the  particular  failure  mode  (see  [16]  2md  the  discussion  of  the  GLR  approach  to  follow). 

Wb  note  that  for  a given  failure  mode,  as  N increases  the  detection  probability  may  decrease  --  i.e.  by 
averaging  a larger  number  of  residuals  we  smooth  out  the  effect  of  a failure  on  v,  and  the  detector  may 
become  somewhat  oblivious  (or  at  the  very  best  responds  quite  slowly)  to  new  data.  On  the  other  h^u^d, 
too  small  a value  of  N may  yield  an  unacceptably  high  value  of 

The  iB?)lementatlon  of  the  chl-squared  test  (38),  (39)  is  quite  simple,  but,  as  one  might  exj-^ct, 
one  pays  fer  this  simplicity  with  rather  severe  limitations  on  performance.  As  described  in  [15], [16] 
this  method  was  applied  to  the  same  inertial  calibrati'^n  and  alignment  problem  to  which  the  Buxbaum- 
Hadded  multiple  hypothesis  approach  [14]-(16],  described  in  Section  V was  applied.  The  performance  of 
the  chl-squared  test  was  mixed.  The  method  is  basically  an  alarm  method  — i.e.  the  system  (38), (39) 
makes  no  attempt  to  Isolate  failures  — and  one  finds  that  those  failure  modes  that  have  dramatic  effects 
on  Y sre  detectable  by  this  method?  however  more  subtle  failures  are  more  difficult  to  detect  with  simple 
scheme.  Comparing  the  performance  of  the  multiple  hypothesis  and  chl-squared  systems,  we  see  that  in 
some  cases  we  can  obtain  superior  alarm  capabilities  if  we  simultaneously  attempt  to  do  failure  isolation 
and  estimation.  Ck^e  can  obtain  some  failure  isolation  information  by  considering  the  components  of  Y 
separately  (this  may  be  especially  useful  for  sensor  failures),  and  we  refer  the  reader  to  [15], [16]  for 
a detailed  discussion  of  this  and  other  aspects  of  the  chi-squared  method. 


:1 
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Another  innovations-based  approach,  developed  by  Merrill  [27),  is  notivated  by  a desire  to  suppress 
bad  sensor  data.  Merrill  devises  a modification  of  the  least  squares  criterion  in  order  to  suppress  ex- 
tremely large  residuals  (which  are  given  a very  large  weighting  in  the  usual  least  squares  framework) , 
and  he  applies  his  methodology  to  a power  system  application. 

A final  technique  in  this  category  has  been  studied  by  several  researchers  --  Willsky  and  Jones 
[28] , [29] , McAulay  and  IDenlinger  [30],  Deyst  and  Deckert  [31],  Sanyaland  shen  [32],  and  Chow,  Dunn  and 
Willsky  [38]—  and  we  will  describe  the  most  general  formulation  of  the  approach,  developed  in  [281,(29]. 
This  technique,  which  we  call  the  generalized  likelihood  ratio  (GLR)  approach,  was  in  part  motivated  by 
the  shortcomings  of  the  simpler  chi-squared  procedure.  The  GLR  approach,  which  can  be  applied  to  a wide 
range  of  actuator  and  sensor  failures,  makes  an  attempt  to  isolate  different  failures  by  using  knowledge 
of  the  different  effects  such  failures  have  on  the  system  innovations.  The  method  provides  an  optimum 
decision  rule  for  failure  detection  and  provides  useful  failure  identification  information  for  use  in 
system  reorganization  subsequent  to  the  detection  of  a failure.  In  addition,  one  can  devise  a number  of 
simplifications  of  the  technique  and  can  study  analytically  the  tradeoff  between  GLR  complexity  and  GLR 
performance. 

Consider  the  basic  dynamical  model  (l)-(3).  The  following  are  4 possible  modifications  of  these 
equations  that  incorporate  certain  sudden  system  changes  (see  Willsky  and  Jones  [28] , [29]  and  Gustafson, 
Willsky,  and  Wang  (36)  for  physical  motivation  for  these  and  other  failure  modes  of  the  same  general 
type) : 


Dyneimic  Jump 

x(k+l)  = <f'(k)x(k)+B(k)u(k)  + w(k)  + v6.^,  „ (40) 

K+J.  ,0 

Here  v is  an  unknown  n-vector,  9 is  the  unknown  time  of  failure,  and  5.  . is  the  Kronecker  delta.  Such  a 

model  can  be  used  to  model  sudden  shifts  in  bias  states  (as  in  the  inertial  problem  studied  in  [15] , 
[161), 

Dynamic  Step 


x(k+l)  = $(k)x(k)  + B(k)u(k)  + w(k)  + 0 


Here  0.  . is  the  unit  step 
13 


0.  . 

13 


^13 

i<j 


(41) 


(42) 


This  model  can  be  used  to  model  certain  actuator  failures  (compare  to  the  Beard-Joncs  example  in  Section 
III;  see  equation  (15)). 


Sensor  Jump 

z(k)  = Kx(k)  + Ju(k)  + v(k)  + g 

We  can  use  this  to  model  bad  data  points. 

Sensor  step 

z<k)  = Hx(k)  + Ju(k)  + v(k)  + V0.  « 

k ,0 


(43) 


(44) 


Sudden  changes  in  sensor  biases  fit  into  this  model. 

By  the  linearity  of  the  system  (l)-(3)  and  the  filter  (4)-(10),  one  can  determine  the  effect  of  each 
of  the  failure  modes  on  the  innovations.  The  general  form  is 

Y(k)  = G(k;0)V  + Y(k)  (45) 

where  Y(k)  is  the  filter  innovations  if  no  failure  occurs,  and  the  matrix  G can  be  precomputed  (see  [29', 
[33]),  This  matrix,  which  is  different  for  each  of  the  four  cases  (40)- M4),  is  called  the  failure  sig- 
nature matrix  and  provides  us  with  an  explicit  description  of  how  various  failures  propagate  through  the 
system  and  filter. 

The  full-blown  GLR  method  involves  the  following;  we  assume  we  are  looking  for  one  of  the  four 
classes  of  failures  and  have  computed  the  appropriate  signatur-  matrix.  Given  the  residuals,  we  cc*npute 
the  maximum  likelihood  estimates  of  v and  9,  and,  assuming  that  these  estimates  are  correct,  we  compute 
the  log-likelihood  ratio  for  failure  versus  no  failure  (see  Van  Trees  [41]  for  a general  discussion  of 
GLR  methods).  The  Implementation  of  the  full  GLR  requires  a linearly  growing  bank  of  matched  filters, 
computing  the  best  estimates  of  v assuming  a particular  value  of  6e{l,,..,k}, 

A number  of  remarks  can  be  made  concerning  the  GLR  system.  We  note  that,  as  with  other  methods  such 
as  Buxbaum-Haddad  or  Chien,  the  inclusion  of  the  variable  6 to  indicate  our  uncertainty  as  to  the  time  of 
failure  keeps  the  detection  system  sensitive  to  new  data.  However,  it  is  the  estimation  of  0 that  causes 
the  growing  complexity  problem.  On  the  other  hand,  even  if  the  full  GLR  is  not  implementable,  it  can 
serve  as  a benchmark  for  other  schemes  and  can  in  fact  be  used  as  a starting  point  for  the  design  of 
simpler  systems.  One  simplification  that  eliminates  the  growing  complexity  is  the  restriction  of  the 
estimate  of  0 to  a window 
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k-N  £ 9 £ k-M  (45) 

where  the  lower  bound  is  included  to  limit  complexity,  and  the  upper  bound  is  set  by  failure  observability 
and  false  alarm  considerations.  Successful  simulation  runs  with  N=M  (i.e.,  when  we  don’t  optimize  6 at 
all  and  have  only  one  matched  filter  for  v)  are  reported  by  Willsky  and  Jones  in  (291.  We  remark  only 
that  the  price  one  pays  for  "windowing"  the  estimate  of  9 is  in  a reduction  in  the  accuracy  of  the  esti- 
mate of  V.  For  example,  in  the  case  of  N=M,  we  often  are  able  to  detect  failures  extremely  quickly,  but 

if  0=k-N  is  not  the  correct  time  of  failure,  the  estimate  of  V may  be  severely  degraded  (e.g.,  our  esti- 
mate of  the  slope  of  a ramp  changes  as  we  change  our  estimate  of  the  time  at  v^ich  it  started).  We  note 

that  the  estimation  of  9 is  similar  to  time-of-arrival  estimation  problems  that  arise  in  various  applica- 

tions, and  refer  the  reader  to  Van  Trees  [44]  for  a general  discussion  of  several  techniques. 

Also,  we  note  that  even  if  the  physical  system  and  filter  are  time-invariant,  the  GLR  monitoring 
system  is  time-varying,  as  the  failure  signature  G includes  transient  effects.  In  some  cases  one  may  be 
able  to  neglect  these  and  utilize  a simpler  steady  state  signature.  This  is  quite  similar  to  Chien's 

use  (241  of  the  steady-state  effect  of  the  failure  on  the  residuals,  and  the  criticisms  of  that  approach, 

given  in  Section  VI,  apply  here  as  well. 

One  can  also  simplify  the  implementation  by  either  partially  or  completely  specifying  the  failure 
magnitude  v.  Constrained  GLR  (CGLR)  is  based  on  the  assumption  that 

V * af^  (46) 

where  a is  an  unknown  scalar  and  f^  is  one  of  r possible  failure  directions.  This  technique  is  described 
in  [29] . If  we  completely  specify  V 


V=Vq  (47) 

we  obtain  the  simplified  GLR  (SGLR)  algorithm  which  is  extremely  simple  to  implement,  as  we  have  comple- 
tely eliminated  the  need  for  the  matched  filters  to  estimate  v.  The  use  of  specified  failure  sizes  is 
similar  to  that  proposed  by  Chien  (24),  although  in  SGLR  one  can  use  the  time-varying  failure  signature, 
which  should  aid  in  failure  detection.  As  initial  results  for  the  detection  of  electrocardiogram  ar- 
rhythmias, indicate  {see  Gustafson,  et.al.,  [36])  the  estimation  of  V is  not  nearly  as  important  for  de- 
tection as  the  matching  of  failure  signatures.  Also,  by  the  use  of  several  values  of  (i.e.  by  in^le- 

menting  several  parallel  SGLR’s),  one  can  achieve  a high  level  of  failure  isolation  without  a great  deal 
of  additional  software  complexity.  In  addition,  one  could  consider  a "dual-mode"  procedure  in  which 
SGLR  is  used  for  alarm  and  isolation,  with  full  GLR  used  only  afterward  in  order  to  estimate  the  magnitude 
of  the  failure. 

The  various  simplifications  of  GLR,  as  well  as  full  GLR,  cure  amenable  to  certain  analysis,  such  as 

the  calculation  of  P , P and  (at  least  for  SGLR)  the  expected  time  delay  in  detection.  By  performing 
F D 

such  analyses,  one  can  study  in  detail  the  tradeoff  between  complexity  and  performance.  A methcxiology 
for  such  comparisons  is  presently  being  developed  and  is  being  applied  to  an  aircraft  failure  detection 
problem.  Initial  results  are  reported  by  Chow,  et.al.,  in  [38],  and  a description  of  a detailed  metho- 
dology will  be  reported  in  the  near  future.  (see  Bueno,  Chow,  Gershwin,  and  Willsky  [43]).  In  addition, 
to  the  calculation  of  P^  and  P^^,  the  comparison  methodology  reported  in  [43]  includes  the  computation  of 

cross-detection  probabilities  — i.e.  the  probability  of  detecting  a failure  of  type  A when  a failure  of 
type  B has  occurred.  Such  information  can  be  useful  in  designing  failure  isolation  procedures  and  also 
in  determining  if  failure  detector  A can  be  successfully  utilized  as  an  alarm  for  failures  of  type  B. 

This  can  lead  to  substantial  simplifications  in  a failure  alarm  system.  Also,  we  refer  the  reader  to 
[29], [36],  and  [38]  for  succesful  simulations  of  the  GLR  method. 

Presently  the  GLR  method  is  being  extended  to  other  failure  modes,  such  as: 

Hard-Over  Actuator  Failure 

x(k+l)  - $(k)x(k)  + Q]u(k)  + w(k)  (48) 

With  this  model  we  can  take  into  account  complete  (or  "off")  failures  of  certain  actuators.  For  example 
an  off  failure  of  the  ith  actuator  can  be  modeled  by  choosing  M all  zero  except  for  the  ith  column,  which 
is  taken  to  be  the  negative  of  the  ith  column  of  B.  The  GLR  detector  for  (48)  is  presently  under  develop- 
ment [38] , [43] , and  we  note  that  this  model  is  more  difficult  than  the  others  as  the  effect  of  the  failure 
is  modulated  by  the  input  values  u(k). 

Increased  Process  Noise  Failures 

x(k+l)  * '^(k)x(k)  + B(k)u(k)  + w(k)  + ^ (49) 

K+1 , o 

Here  5 is  additional  white  process  noise. 

Hard-Over  Sensor  Failures 

x(k+l)  - Hx(k)  + Ju(k)  + v(k)  + [Mx(k)  + Su(k)]tT  (50) 

k,y 
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Here  the  failures  are  modulated  by  u and  x,  and  a failure  of  the  ith  sensor  is  modeled  by  choosing  the 


The  analysis  of  these  failure  modes  is  presently  being  performed  [381 , 143] , and  it  is  anticipated  that 
SGLR  algorithms  will  also  be  developed. 

In  addition  to  these  failure  inodes,  one  can  develop  additional  models  along  these  lines  for  particu- 
lar applications.  In  particular#  we  have  developed  several  additional  models  similar  to  those  described 
by  equations  (40) -(44)  for  our  work  on  the  detection  and  classification  of  arrhythmias  in  electrocardio- 
grams, The  results  reported  in  [36]  are  rather  striking,  as  in  all  the  tests  performed  we  observed  no 
false  alarms,  detected  all  rhythm  changes  immediately  (with  no  incorrect  estimates  of  6),  and  classified 
all  rhythm  changes  correctly.  These  tests  utilized  the  full  GLR  approach  and  have  provided  useful  insight 
into  the  characteristics  of  the  method.  For  example,  the  use  of  maximum  likelihood  estimates  of  V and  6 
precludes  the  use  of  a priori  statistics  on  these  variables.  In  the  ECG  problem,  one  is  quite  interested 
in  accurate  estimates  of  V,  and  one  also  can  come  up  with  reasonable  a priori  statistics  on  V based  on 
physical  arguments.  Thus,  it  may  pay  to  incorporate  such  a priori  statistics  into  the  GLR  system,  and 
this  can  be  done  rather  easily  by  proper  initialization  of  the  matched  filters  estimating  v. 

On  the  other  hand,  for  the  ECG  problem  one  does  not  want  to  look  for  cibrupt  changes  at  one  point  in  the 
record  more  than  at  another#  and  thus  it  does  not  make  sense  to  include  a priori  statistics  on  9 , In 
fact#  one  can  argue  that  inclusion  of  a priori  failure  information  tends  to  discount  the  observed  data  in 
order  to  avoid  false  alarms  (unless  failures  are  extremely  likely),  and  one  should  probably  avoid  the  in- 
clusion of  such  information  unless  one  is  especially  worried  about  false  alarms.  However,  if  one  wishes 
to  use  such  data,  one  can  utilize  the  interpretation  of  the  likelihood  ratios  as  ratios  of  conditional 
probabilities  of  failure  time  in  order  to  determine  the  appropriate  modification  of  GLR  [29], 

Finally,  we  note  the  the  GLR  system  provides  extremely  useful  information  for  system  compensation 
subsequent  to  the  detection  of  a failure.  For  exait^le,  one  can  utilize  the  GLR-produced  estimates  of  v 
and  9 to  determine  an  optimal  update  procedure  for  the  filter  estimate  and  covariance  [29] . Once  this 
update  has  been  performed,  the  GLR  system  can  be  used  to  detect  further  failures,  thus  allowing  the  de- 
tection of  multiple  events.  We  refer  the  reader  to  [29], [38]  for  further  discussions  of  the  use  of  GLR- 
produced  information  in  the  design  of  failure  compensation  systems, 

VIII.  Conclusions 

In  this  paper  we  have  discussed  a nuit^er  of  the  issues  involved  in  the  design  of  failure  detection 
systems.  We  have  also  reviewed  a variety  of  existing  failure  detection  methods  and  have  discussed  their 
characteristics  and  design  tradeoffs.  The  failure  detection  problem  is  an  extremely  complex  one#  and 
the  choice  of  an  appropriate  design  depends  heavily  on  the  particular  application.  Issues  such  as  avail- 
able computational  facilities  and  level  of  hardware  redundancy  enter  in  a crucial  way  in  the  design  deci- 
sion. For  example,  as  we  have  mentioned,  the  use  of  a sophisticated  failure  detection-compensation  system 
may  allow  one  to  reduce  the  level  of  hardware  redundcincy  without  much  of  a loss  in  overall  system 
rel lability . 

The  develojiment  of  failure  detection  method  is  still  a relatively  new  subject.  At  this  time  most  of 
the  work  has  been  at  a theoretical  level  with  only  a few  real  applications  of  techniques  [6] -(9),  !13), 
[31],  (361,  Wuch  work  is  yet  to  be  done  in  the  development  of  implementable  systems  complete  with  a 
variety  of  design  tradeoff?.  Work  is  needed  in  the  development  of  efficient  techniques  for  failure  com- 
pensation and  »ystem  reorganization.  In  addition,  there  is  a great  need  for  the  analysis  of  the  robust- 
ness of  various  failure  detection  systems  in  the  presence  of  variations  in  system  parameters  and  in  the 
presence  of  modeling  errors  and  system  nonlinearities.  For  example#  it  is  conjectured  that  SGLR  is  less 
sensitive  to  pareuneter  errors  than  the  more  complex  full  GLR;  however#  at  present  there  are  no  analytical 
results  or  simulations  to  support  this  conjecture.  These  and  other  issues,  such  as  the  incorporation  of 
fault-tolerant  computer  concepts  into  an  overall  reliable  design  methodology  (see  Deyst  (401)  await  in- 
vestigation in  the  future. 
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SUMMARY 

The  complementary  analytic-simulative  technique  (CAST)  evolved  as  it  became  evident 
that  neither  analysis  nor  simulation  alone  could  satisfy  the  evaluation  requirements  of 
complex  computer  systems.  Analytic  modeling  provides  flexibility  and  rapid,  economical 
data  generation.  Simulation  permits  computer  system  details  to  be  included  easily,  but 
extensive  data  generation  is  slow  and  expensive.  CAST  permits  the  user  to  obtain  the  best 
features  of  both  analytic  modeling  and  simulation.  CAST  is  based  on  the  desirability  of 
accurately  modeling  complex  computer  systems,  utilizing  closed-form  mathematical  expres- 
sions for  the  computer  system  failure  probability.  This  is  achieved  through  the  use  of 
analytic  models  which  utilize  parameters  determined  both  by  simulation  and  engineering 
characterization  of  the  computer  system.  This  concept  was  evolved  under  a NASA  Langley 
Research  Center  contract  concerned  with  the  reconf igurable  computer  systems  (RCS)  for 
commercial  aircraft.  The  wor)t  was  then  continued  for  application  to  the  Shuttle-orbiter 
data  processing  subsystem  (DPS)  on  a NASA  Johnson  Space  Center  contract. 

CAST  has  been  used  to  determine  the  survivability  of  one  of  the  early  test  configu- 
rations of  the  Space  Shuttle  Data  Processing  Subsystem  (DPS) . The  DPS  mission-critical 
survivability  for  a six-hour  mission  was  determined  to  be  0.999863  for  the  Shuttle  ALT 
baseline  configuration.  The  analysis  led  to  the  evaluation  of  three  selected  options 
which  identified  two  areas  of  possible  improvement.  The  Shuttle  woric  included:  extending 

the  GPC  analytic  model  to  include  imperfect  detectability;  creating  a new  analytic  model 
to  handle  configurations  involving  non-symmetrical  interconnections;  creating  a new  ana- 
lytic model  to  handle  combinations  of  dependent  device  sets  (e.g.,  flight-critical  bus  and 
connected  units);  adding  three  routines  to  reflect  transient  recovery  procedure  differences; 
and  developing  a simulation  for  the  flight-critical-bus  partition. 

CAST  also  has  been  applied  to  a number  of  example  computer  system  configurations 
for  avionics  applications  to  provide  insight  into  the  important  aspects  of  these  configu- 
rations. The  conclusions  are  based  on  a ten-hour  flight  and  failure  rates  thought  to  be 
applicable  to  the  off-the-shelf  avionics  computers  studied.  The  reconf igurable  computer 
systems  wore  assumed  to  be  composed  of  as  many  as  five  machines.  It  was  determined  that 
the  greatest  improvement  in  system  survivability  is  obtained  by  increased  redundancy. 

Each  increment  of  redundancy  decreases  the  10-hour  failure  probability  by  approximately 
two  orders  of  magnitude.  The  greatest  failure  probability  decrease  occurs  when  changing 
from  triplex  to  quadruplex,  e.g.,  a 200-fold  improvement.  Increasing  redundancy  also 
increases  cost  in  terms  of  power,  weight,  and  volume  not  only  due  to  the  added  units  but 
due  also  to  the  increased  complexity  of  intercommunications  modules,  external  electronics 
modules,  and  bus  switches. 

Increasing  redundancy  has  diminishing  returns  if  there  are  errors  in  permanent- 
recovery  algorithm  design.  This  error  penalty  becomes  more  severe  with  added  redundancy.  . 

Using  simpler  recovery  algorithms,  i.e.,  those  involving  less  RCS  adaptivity,  is  a pos-  ! 

sible  way  of  ensuring  error-free  recovery.  However,  the  increase  in  failure  probability  ' 

for  air-transport-type  missions  due  to  decreased  adaptivity  (e.g.,  not  adapting  the  sys-  I 

tern  down  to  one  computer)  is  less  than  that  caused  by  decreased  redundancy  or  recover-  i 

ability.  Since  redundancy  has  such  a large  effect  on  failure  probability,  external  hard-  I 

ware  should  have  an  equivalent  redundancy  to  prevent  external  failures  from  depressing  ) 

the  overall  survivability. 

The  analytic  modeling  has  resulted  in  the  successful  formulation  of  surviv- 
ability expressions  for  an  RCS  composed  of  N computers,  where  N is  any  finite,  positive 
integer.  The  models  included  explicit  consideration  of  the  components  of  coverage  and 
the  effects  of  transient  faults.  Hardware/software  interactions  are  included  implicitly. 

The  simulator  portion  of  CAST  is  based  on  descriptions  of  the  states  the  RCS  could 
assume  and  the  transitions  between  these  states.  Use  of  this  approach  results  in  a fault- 
driven  simulation  which  minimizes  the  simulation  cost.  The  RCS  simulator  requires  that 
the  fault  environment,  the  configuration  particulars,  the  system  failure  criteria,  the 
software  structure,  the  test  features,  and  the  recovery  approaches  be  specified.  The 
modeling  parameters,  plus  other  useful  data  such  as  state-transition  statistics,  result 
from  the  simulation. 

The  techniques  reported  here  devote  much  attention  to  the  modeling  of  transient 
faults.  The  results  show  that  a knowledge  of  the  transient  environment  results  in  effec- 
tive transient  recovery  features.  Underestimating  transient  duration  results  in  many 
transients  being  recorded  as  permanent,  while  overestimating  transient  duration  leaves 
the  system  unduly  vulnerable  to  further  faults. 
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Device  type-A  (ith  unit) 

Device  type-B  (ith  unit) 

Controller  i 

Transient  duration  }. 

Reliability  of  the  controller  at 
time  T 

Survivability  of  system  at  redun-  '^i 
dancy  level  i (probability  that 
system  functions  correctly  over  a v. 
mission  of  length  T)  ^ 

Survivability  of  device  A at  re- 
dundancy level  i 

Survivability  of  device  B at  re- 
dundancy level  i 

Mission  duration  ^ 


Coverage,  defined  as  the  product 
of  u.,  V.,  and  w,;  i.e.,  the 
probability  the  System  recovers 
given  a fault  occurs 

Transient  leakage  (probability  of 
failure  of  transient  recovery 
given  that  fault  is  transient) 

Detectability  (probability  fault 
is  detected  given  fault  occurs) 

Diagnosability  (probability  fault 
is  isolated  given  fault  is  de- 
tected) 

Recoverability  (probability  sys- 
tem recovers  given  fault  is 
detected) 

Permanent  fault  rate 
Transient  fault  rate 


1.  INTRODUCTION 

As  more  and  more  demands  and  responsibility  are  placed  on  the  computer  in  a flight 
control  system,  the  need  for  computer  fault  tolerance  has  increased.  As  with  any  non- 
physical feature  of  an  avionics  subsystem,  it  is  necessary  to  estimate  v'arious  parameters 
before,  during,  and  subsequent  to  the  design  process.  Thus  it  is  necessary  to  evaluate 
the  fault  tolerance  of  the  computer  subsystem  in  a flight  control  system. 

This  paper  describes  the  first  two  phases  of  work  being  performed  for  the  U.S. 

National  Aeronautics  and  Space  Administration  (NASA) . The  approach  described  here  is 
different  from  those  taken  in  the  past  in  that  it  is  more  comprehensive  and  detailed. 

The  work  is  designed  to  produce  usable  tools--!. e.,  tools  that  can  actually  be  put  to 
work  in  the  engineering  design  process  of  a flight  control  system. 

The  main  body  of  the  paper  is  divided  into  six  sections.  The  first  of  these  is  in- 
tended to  provide  the  reader  with  an  ovei'view  of  the  concept  before  he  becomes  enmeshed 
in  the  mathematics  and  flow  charts.  The  second  section  describes  various  fault-tolerance 
concepts  so  that  the  paper  can  be  read  without  extensive  perusal  of  references.  Follow- 
ing these  introductory  sections,  the  analytic  models  and  simulative  models  are  described 
in  the  next  two  sections.  The  next  section  describes  the  applications  of  the  technique 
which  have  been  made  to  date.  Finally,  conclusions  are  presented. 

2.  THE  COMPLEMENTARY  ANALYTIC  SIMULATIVE  TECHNIQUE 

The  Complementary  Analytic-Simulative  Technique  (CAST)  evolved  as  a result  of  a study 
performed  for  NASA  Langley  Research  Center.  The  objective  of  the  study  was  to  provide 
concepts  and  engineering  data  from  which  a highly  reliable,  fault-tolerant,  reconf igurable 
computer  system  (RCS)  for  aircraft  applications  could  be  designed.  For  the  purposes  of 
the  study,  an  RCS  was  defined  to  be  a redundant  configuration  of  off-the-shelf  avionics 
computers  which  achieved  fault  tolerance  through  use  of  a variety  of  recovery  techniques. 

A principal  study  goal  was  the  development  and  application  of  reliability  and  fault- 
tolerance  assessment  techniques.  Particular  emphasis  was  placed  on  the  needs  of  an  all- 
digital,  fly-by-wire  control  system  appropriate  for  a passenger-carrying  airplane.  A 
representative  set  of  results  obtained  from  applying  CAST  to  an  RCS  is  shown  in  Figure  1. 

This  Complementary  Analytic-Simulative  Technique  (CAST)  evolved  as  it  became  evident 
that  neither  analysis  nor  simulation  alone  could  satisfy  all  the  RCS  evaluation  require- 
ments. Analytic  modeling  provides  flexibility  and  rapid,  economical  data  generation. 
However,  the  solutions  for  some  configurations  are  very  cumbersome  and,  in  certain  cases, 
the  mathematical  model  formulated  is  intractable.  Simulation  permits  computer  system 
details  to  be  included  easily,  but  data  generation  is  slow  and  expensive.  CAST  permits 
the  user  to  obtain  the  best  features  of  both  analytic  modeling  and  simulation.  When  these 
two  system  evaluation  approaches  are  combined,  and  are  supplemented  by  an  engineering 
characterization  of  the  system,  a very  powerful  technique  results.  The  combination  is 
illustrated  in  Figure  2. 

The  engineering  characterization  is  performed  to  provide  six  categories  of  information 
to  the  analytic  modeling  and  the  simulation.  These  information  categories  are:  (1)  con- 

figuration particulars,  (2)  fault  environment,  (3)  system  failure  criteria,  (4)  software 
structure,  (5)  recovery  features,  and  (6)  test  features.  The  individual  items  in  these 
six  categories  are  shown  in  Figure  2.  The  following  items  are  available  as  simulator  out- 
puts: (1)  permanent-fault  coverage,  (2)  transient-fault  coverage,  (3)  detectability, 

(4)  diagnosability,  and  (5)  recoverability.  The  analytic  modeling  provides  the  following 
measures  of  fault  tolerance:  (1)  computer  system  survivability  (or  failure  probability), 

and  (2)  computer  system  reliability. 
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3.  CONCEPTS  OF  FAULT  TOLERANCE 

There  are  two  general  methods  of  measuring  the  effectiveness  of  the  fault  tolerance. 

One  way  is  to  count  the  worst-case  number  of  faults  that  a system  will  tolerate  before 
failure.  An  example  of  this  is  a fail  operational/fail  safe  (FO/FS)  specification.  The 
specification  of  the  number  of  faults  to  be  tolerated  gives  the  user  a feeling  for  the 
redundancy  employed  and  gives  him  the  reassurance  that  a fault  will  not  bring  down  the 
entire  system.  But  a measure  is  needed  to  specify  the  probability  that  the  system  will 
not  fail  over  a time  span.  This  other  measure  is  generally  called  the  mission  success 
probability.  The  mission  success  probability  that  is  most  commonly  used  is  the  reli- 
ability. This  is  the  probability  that  at  least  one  set  of  functioning  hardware  survives 
the  entire  mission.  Reliability  considers  only  permanent  hardware  faults.  Survivability 
is  the  measure  we  shall  consider  here.  It  considers  the  effects  of  transient  faults  as 
well  as  permanents  in  its  results,  and  is  the  result  of  the  application  of  CAST. 

Configurations  Under  Consideration 

CAST  is  applicable  to  multicomputer  systems  which  are  formed  by  interconnecting  iden- 
tical computers  into  a symmetrical  fault-tolerant  configuration.  These  computers  perform 
identical  functions  in  synchronism  so  that  valid  results  are  available  in  case  of  an  erro- 
neous computation.  The  failure  of  an  individual  computer  is  handled  by  switching  it  out 
of  the  system  and  continuing  system  operation  with  one  less  computer.  Ideally,  the  system 
could  lose,  because  of  component  failures,  all  but  one  computer  and  still  perform  properly. 
However,  unless  the  mechanisms  for  fault  detection,  fault  location,  and  system  recovery 
are  perfect,  the  system  may  not  recover  from  a fault  even  though  adequate  redundancy  is 
available.  It  is  important  to  note  that  the  input/output  bus  system  and  devices  also 
must  be  protected  by  redundancy  in  order  to  guarantee  the  critical  data  processing  tasks 
are  performed.  The  important  features  of  a fault-tolerant  multicomputer  system  are: 

(a)  synchronization  of  inputs,  outputs  and  task  performance;  (b)  fault  detection  and 
diagnosis;  and  (c)  transient  and  permanent  recovery. 

Fault-Handling  Methods 

Fault  detection  is  the  antecedent  of  actions  necessary  to  recover  from  a fault.  In  a 
multicomputer  system,  fault  detection  is  accomplished  either  by  cooperative  action  between 
computers  or  by  self-detection  methods.  Cooperative  action  includes  such  methods  as 
voting  or  comparison  of  results.  Voting  may  be  a compilation  of  agreements  or  disagree- 
ments of  comparisons  between  three  or  more  machines,  or  it  may  be  performed  by  a hardware 
voter.  Location  of  the  faulty  machine  is  immediate  from  detection  by  voting,  but  detec- 
tion by  comparison  between  two  machines  (as  in  duplex)  still  leaves  the  faulty  machine  to 
be  resolved  by  self-test.  Self-detection  methods  include  hardware  built-in  test  equipment 
or  software  diagnostics.  Examples  of  hardware  built-in  test  in  current  computers  include 
such  things  as  memory  parity,  watchdog  timer,  illegal  operation  code,  or  power  failure. 
These  methods  seldom  achieve  high  coverage  by  themselves,  but  in  conjunction  with  software 
diagnostics  they  are  the  only  method  of  resolving  the  faulty  computer  after  fault  detec- 
tion by  comparison  between  results  from  two  remaining  computers. 

Faults  may  be  either  transient  or  permanent.  A permanent  fault  is  a hardware  failure 
that  prevents  the  computer  from  functioning.  Recovery  may  be  effected  by  ignoring  the 
faulty  machine,  removing  its  power,  masking  by  the  voters,  or  by  other  methods  of  removal 
from  the  redundant  set.  Before  removal  we  must  first  determine  whether  the  fault  was 
caused  by  a transient,  and  if  it  was,  restore  the  faulty  machine.  The  method  of  tran- 
sient determination  is  usually  to  attempt  a recovery  sequence.  If  the  error  has  been 
removed,  the  fault  will  not  be  redetected.  Transient  faults  are  caused  by  a temporary 
malfunction  that  lasts  for  a relatively  short  duration.  Longer  duration  faults  can  last 
sufficiently  long  to  be  recorded  as  permanent  failures.  Shorter  transients,  however,  may 
leave  one  or  more  errors  in  program  and/or  data  during  their  stay.  Transient  recovery 
methods  are  designed  to  correct  these  transient-caused  errors.  Destructive  readout  memo- 
ries are  particularly  vulnerable  to  transients  during  the  restore  cycle.  Data  may  be 
altered  during  a read  cycle.  Data  may  be  altered  by  transients  in  the  CPU  or  memory. 
Rollahead  and  rollback  are  two  methods  of  recovering  from  errors  in  the  current  data  in 
order  to  continue  computation.  Rollback  is  a self-recovery  scheme  where  the  computational 
segment  in  which  the  error  was  detected  is  repeated  with  the  previous  data  state.  Rolla- 
head is  a cooperative  recovery  method  where  the  valid  current  results  are  passed  from  the 
good  computer  to  the  disagreeing  computer  and  computation  resumes.  Two  or  more  good  com- 
puters are  required  for  rollahead  while  rollback  is  independent  of  the  redundancy  level. 
Neither  of  these  methods  correct  program  memory  faults.  There  is  a cooperative  method 
called  memory  copy  that  accomplishes  this.  Here  the  memory  contents  of  the  good  computers 
are  voted  into  the  disagreeing  machine  at  a low  duty  cycle.  This  allows  real-time  compu- 
tations to  continue.  After  the  memory  copy  is  completed,  a rollahead  is  performed  to 
place  the  recovered  computer  into  synchronization.  Memory  copy  recovery  leaves  the  sys- 
tem with  degraded  redundancy  for  a period  of  time,  but  it  corrects  many  transients  that 
would  otherwise  be  mistakenly  labeled  permanent. 

4.  CAST  APPLICATIONS 

CAST  has  been  applied  to  the  space  Shuttle-orbiter  data  processing  system  (DPS)  for 
the  approach  and  landing  test  (ALT)  configuration  to  obtain  mission  survivability  and 
evaluate  possible  design  modifications.  Before  describing  the  applications,  a brief 
description  of  the  DPS  will  be  presented. 
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Shuttle  Data  Processing  System  (ALT) 

The  Shuttle  DPS  is  composed  of  five  identical  general-purpose  computers  (GPCs) 
connected  with  each  other  and  the  peripheral  devices  by  a system  of  redundant  serial 
buses.  In  the  ALT  configuration  the  fifth  GPC  is  set  aside  as  backup  flight  control 
for  the  landing  tests  and  is  not  included  in  the  modeling.  All  functions  that  are  not 
related  to  the  return  flight  are  not  in  the  ALT  configuration  and  are  hence  not  modeled. 
The  resulting  ALT  configuration  is  shown  in  Figure  3.  As  shown  in  the  figure. 
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FIGURE  3 SHUTTLE  ORBITER  COMPUTER  SYSTEM  BLOCK  DIAGRAM  (ALT) 


communication  among  the  GPCs,  and  between  the  GPCs  and/or  the  peripheral  devices  is  effected 
through  use  of  seven  groups  of  buses.  The  number  of  buses  in  each  group  is  shown  on  the 
figure.  Each  of  these  buses  is  a one-megahertz,  serial  bus.  Communication  between  units 
on  a bus  is  accomplished  through  use  of  command  words,  command  data  words,  and  response 
data  words.  Each  GPC  is  composed  of  a central  processing  unit  (CPU),  memory,  and  an  input, 
output  processor  (lOP) . All  information  transfers  to  and  from  the  GPCs  are  handled  through 
the  lOP.  Software  control  is  used  to  instruct  each  bus  within  a data-bus  group  whether  it 
is  to  operate  in  the  command  or  listen  mode.  When  operating  in  the  command  mode,  data 
requests  and  commands  are  sent  to  the  peripheral  equipment  and  the  data  is  then  supplied 
over  the  same  bus.  When  in  the  listen  mode,  data  are  only  received  on  the  bus. 

The  bus  configuration  allows  each  computer  to  have  access  to  all  f light-critical  data 
received  or  transmitted  by  the  other  computers.  Each  of  the  redundant  subsystems  is  con- 
nected to' a different  bus.  Hence  for  data  input,  a different  computer  requests  data  from 
each  of  the  subsystems.  The  requested  data  are  then  available  to  all  other  computers. 

Thus  identical  input  data  are  available  to  each  computer  in  the  DPS.  For  data  output, 
since  each  channel  of  the  actuator  subsystem  is  connected  to  a different  bus  of  the  group, 
a different  computer  transmits  command  data  to  each  of  the  voting  actuator  channels.  As 
a result  of  the  bus-computer  interconnections,  each  computer  can  monitor  the  command  data 
sent  out  by  each  of  the  other  computers.  When  data  is  to  be  transferred  between  computers, 
each  computer  communicates  with  all  other  computers  through  the  intercomputer  communication 
(ICC)  buses.  Only  the  GPCs  are  connected  to  the  ICC  buses.  In  order  to  avoid  data  skew 
of  either  inputs  or  outputs,  synchronization  is  accomplished  in  the  DPS  through  use  of 
intercomputer  discrete  signals  and  synchronization  software.  Sensors  and  actuators  are 
connected  to  the  appropriate  bus  through  multiplex-demultiplex  (MDM)  units.  Analog 
flight  display  units  are  connected  to  their  bus  through  display  driver  units  (DDU) , while 
the  multifunction  CRT  display  system  (MCDS)  is  connected  through  display  electronic  units 
(DEU) . The  pulse  code  modulation  master  units  (PCMMU)  are  connected  directly  to  the  GPCs 
by  dedicated  buses.  The  mass  memory,  launch,  and  payload  buses  are  not  applicable  to  ALT 
but  are  included  in  the  figure  for  completeness. 

Fault  detection  in  the  GPCs  is  accomplished  by  five  methods:  compare-word  sum  check, 

bus  channel  timeout  test,  built-in  test  equipment,  watchdog  timer,  and  self-test  programs. 
The  compare-word  sum  check  involves  summing  critical  GPC  actuator-command  outputs,  and 
each  GPC  comparing  its  sum  with  that  of  the  others.  This  check  is  performed  each  computa- 
tion cycle.  This  comparison  is  performed  by  use  of  the  Fault  Detection  Identification 
Program.  If  the  difference  is  greater  than  that  allowable  and  has  occurred  the  maximum 
permissible  number  of  times,  then  the  fail-discrete  of  the  faulty  GPC  is  set.  There  are 
two  recovery  approaches  available  in  the  Shuttle  GPC  configuration.  The  first  of  these 
is  one  in  which  the  crew  identifies  a failed  GPC  through  use  of  the  "failed-discrete" 
and  may  either  switch  out  the  failed  machine  or  try  an  initial  program  load  (IPL) . The 
IPL  approach  is  used  when  there  is  reason  to  believe  that  a transient  fault  has  been 


w 


6-7 


experienced.  The  second  recovery  approach  is  to  crew-enable  inhibition  of  transmission 
of  outputs  from  the  failed  GPC.  This  inhibition  is  accomplished  automatically  once  it 
has  been  enabled  by  the  crew.  It  should  be  noted  that  restoration  of  a GPC  that  may  have 
suffered  a transient  is  not  attempted  during  the  action  portion  of  ALT.  This  is  because 
of  the  stringent  recovery  time  constraints  and  the  fact  that  restoring  and  adding  a com- 
puter to  the  redundant  set  during  time-critical  mission  phases  requires  a significant 
cimount  of  computer  memory  and  time  and  introduces  greater  than  desirable  operational 
complication.  Fault  detection  in  the  peripheral  units  of  the  DPS  is  accomplished  by  a 
combination  of  BITE  and  GPC-supervised  te^ts.  The  recovery  approach  used  depends  upon 
the  particular  unit. 

Shuttle  Data  Processing  System  Partitioning 

The  Shuttle  DPS  is  a complex  system.  It  consists  of  extensive  peripheral  devices  for 
the  control  of  the  vehicle  and  its  payload  through  launch,  orbit  maneuvers,  and  the  return 
flight.  The  exact  modeling  of  this  complex  system  as  a whole  is  not  feasible  because  of 
the  mathematical  complexity  necessary  to  account  for  unit  interactions.  A systematic 
method  of  reducing  the  problem  to  solvable  pieces  is  required.  Such  a method  is  the 
partitioning  of  the  system  into  statistically  independent  module  sets.  By  independence 
of  module  sets,  we  mean  independence  with  respect  to  the  impact  of  faults  from  one  set  to 
the  other.  A definition  of  independence  is  as  follows:  Given  a collection  of  module 

sets,  the  sets  are  independent  of  each  other  if  a faulty  module  within  o^e  set  does  not 
incapacitate  modules  within  any  other  set.  However,  within  each  independent  module  set, 
a failure  of  one  module  type  has  an  effect  on  other  module  types.  For  example,  a CPU 
fault  would  cause  its  lOP  to  not  function  properly,  and  an  MDM  failure  would  prevent 
access  to  the  devices  it  services.  Having  defined  the  independent  partitions,  the  sur- 
vivability of  each  partition  may  be  determined  independently  and  the  system  survivability 
is  the  product  of  the  survivabilities  of  the  partitions. 

The  first-cut  partitions  are  along  the  lines  of  the  bus  groups.  These  groups  are: 
the  four  general-purpose  computers  (GPC) ; the  flight-critical  buses  and  connected  equip- 
ment (FCB) ; the  display  equipment  and  their  buses  (MCDS) ; and  the  flight  instrumentation 
and  buses  (PCM) . A failure  of  one  of  these  groups  has  a different  impact  on  the  Shuttle 
mission  depending  on  the  group.  There  are  two  levels  of  failure  criticality:  safety- 

critical  and  mission-critical.  Safety-critical  failures  threaten  the  Shuttle  vehicle 
and  the  lives  of  the  crew,  while  mission-critical  failures  affect  the  accomplishment  of 
the  mission.  A bus  group  falls  into  one  of  these  two  categories.  The  safety-critical 
partitions  for  ALT  are:  the  GPCs,  the  flight-critical  bus  group,  and  the  MCDS.  A 
safety-critical  failure  is  also  mission-critical  since  a lost  vehicle  implies  an  unsuc- 
cessful mission.  Therefore,  safety-critical  partitions  are  also  mission-critical.  The 
flight  instrumentation  is  mission-critical. 

The  flight-critical  bus  system  consists  of  8 buses  connected  to  4 forward  MDMs , 4 
aft  MDMs,  and  2 DDUs.  Failure  in  one  of  these  module  groups  does  not  affect  the  other 
module  groups.  Bus  failures  do  affect  more  than  one  module  group,  but  the  bus  failure 
rate  is  very  small  compared  to  those  of  the  modules.  Because  it  is  small,  the  bus  fail- 
ure rate  is  very  small  compared  to  those  of  the  modules.  Because  it  is  small,  the  bus 
failure  rate  can  be  included  with  each  of  the  module  groups  with  a very  small  resultant 
error.  The  result  is  a slightly  pessimistic  estimation  of  the  survivability.  Therefore, 
the  forward  MDMs,  aft  MDMs,  and  DDUs,  with  the  buses  attached  to  each,  constitute  three 
more  partitions. 

An  additional  aspect  of  complexity  is  the  extensive  input/output  networ)c.  Additional 
analytic  models  and  extensive  simulator  expansions  are  required  to  model  these.  An  exam- 
ple of  this  complexity  is  the  MCDS.  Here  there  are  two  )ceyboards  serving  three  display 
units  with  an  unusual  switching  arrangement.  Another  area  is  the  flight-critical  bus. 

Here  the  MDMs  or  DDUs  serve  as  peripheral  controllers  serving  several  devices.  A device 
failure  does  not  incapacitate  other  devices.  However,  an  MDM  or  DDU  failure  incapacitates 
all  devices  served  by  it. 

Shuttle  (ALT)  Results 

In  order  to  ma)ie  mission  success  probability  calculations  using  the  analytic  models, 
it  is  necessary  to  obtain,  by  simulation,  values  for  the  various  parameters  required  in 
these  models.  The  approach  taJten  was  to  obtain  a baseline  set  of  parameters  and  then 
vary  these  parameters  to  reflect  the  several  options  investigated.  Approximately  200 
parameters  for  both  the  models  and  simulator  are  required  to  be  specified.  The  surviv- 
ability results  for  the  safety-critical  partitions,  as  well  as  the  overall  safety-critical 
and  mission-critical  results  are  given  in  Figure  4.  Failure  probability,  which  is  one 
minus  survivability,  is  shown  on  a logarithmic  scale  as  a function  of  time.  It  shows  that 
the  forward  flight-critical  MDMs  and  devices  are  the  greatest  contributor  to  the  overall 
failure  probability.  This  is  due  to  lower  coverage  on  the  devices  than  the  GPCs,  the 
devices'  relatively  high  failure  rates  and  a lower  device  redundancy. 
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Figure  5 graphically  illustrates  how  increased  redundancy  improves  failure  probability. 
The  figure  is  a log-log  plot  with  mission  times  from  one  to  1000  hours  and  failure  pro- 
babilities from  10"^  to  .1.  The  higher  the  level  of  redundancy,  the  better  the  failure 
probability  prediction  as  om  would  expect.  The  GPCs  have  a higher  redundancy  level  than 
the  flight-critical  devices,  so  that  their  survivability  prediction  is  greater.  But  any 
additional  increases  in  GPC  redundancy  gain  nothing  unless  the  other  system  components 
are  also  increased  in  redundancy.  Figure  6 demonstrates  how  improvements  in  detectability, 
diagnosability , and  recoverability  improve  the  mission  failure  probability.  The  figure 
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FIGURE  4 SHUTTLE  BASELINE  FAILURE  PROBABILITIES 


shows  how  the  profile  of  failure  probability  versus  time  is  affected  by  different  recover- 
ability values  for  the  four  computers.  The  straight-line  portion  is  for  the  perfect  cover- 
age case.  For  imperfect  recoverability,  the  failure  probability  for  early  mission  times 
is  dominated  by  coverage  until  it  "catches  up"  with  the  perfect  coverage  case.  It  does 
not  immediately  join  the  perfect  coverage  failure  probability  curve,  but  approaches  it  as 
an  asymptote.  F-or  shorter  mission  times,  such  as  in  the  Shuttle  ALT,  there  is  much  to  be 
gained  by  improving  the  coverage  components.  Adaptive  configurations  can  adjust  to  a new 
fault-tolerant  scheme  when  units  have  been  recorded  as  faulty.  Non-adaptive  configurations 
are  duplex,  TMR,  QMR  {three-out-of-five  vote),  and  two-out-of-n  votes  (n  = 4,5).  Because 
a two-out-of-five  vote  is  capable  of  recovering  from  a fault  with  three  GPCs  remaining, 
it  is,  in  a sense,  more  adaptable  than  QMR  which  fails  after  a fault  with  three  computers. 
Quintuplex,  quadruplex,  and  triplex  are  adaptive  cases  where  voting  is  the  prime  method  of 
fault  detection  and  diagnosis  with  three  or  more  fault-free  computers.  The  residual  duplex 
mode  is  entered  when  two  computers  remain.  The  resulting  failure  probabilities  versus 
time  are  shown  in  Figure  7. 

Transient  Recovery  Effectiveness . The  section  on  the  concepts  of  fault  tolerance  dis- 
cussed  some  transient  fault  recovery  methods.  Here  we  show  what  improvements  are  possible 
from  the  present  Shuttle  recovery  method.  The  Shuttle  transient  recovery  method,  which  is 
a delay  before  attempting  a permanent-fault  recovery,  is  quite  effective  for  transient 
faults  occuiring  external  to  the  GPCs.  This  is  due  to  the  filtering  of  the  processing 
algorithms  and  the  slow  response  time  of  the  actuators  and  displays.  This  recovery  method 
is  not  as  effective  for  transients  within  the  GPC.  It  is  easy  for  a program  to  be  altered 
by  a memory  transient  during  a restore  cycle.  Also,  CPU  and  lOP  transients  can  alter  data. 
Thus,  a GPC  can  be  left  with  a "permanent"  fault  actually  resulting  from  a transient.  The 
three  alternate  transient-fault  recovery  options  studied  here  are  rollbacit,  rollahead,  and 
a combination  of  rollahead  and  memory  copy.  As  an  instructive  additional  exercise,  we 
examine  the  case  where  the  GPC  memory  is  non-destructive  readout  (NDRO) . This  is  not  the 
case  for  Shuttle,  but  the  results  are  interesting.  RollbacIc  is  the  procedure  where  the 
current  program  segment  is  rerun  following  fault  detection.  Rollahead  is  the  procedure 
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where  the  fault-free  GPCs  pass  the  current  machine-state  and  data  points  to  the  indicated 
faulty  machine  and  continue  computation.  Memory  copy  is  the  procedure  where  the  contents 
of  the  memories  of  the  good  GPCs  are  passed  to  the  faulty  GPC  at  a low  duty  cycle  on  a 
cycle-stealing  basis.  Memory  copy  is  followed  by  a rollahead  after  completion  to  bring 
the  faulty  GPC  on  line.  The  results  are  given  in  Figure  8 as  a plot  of  six-hour  failure 
probability  versus  transient  fault  rate.  Delay  recovery  exhibits  the  greatest  sensitivity 
to  transient  faults.  Rollahead  is  not  as  effective  as  rollback  because  it  is  not  appli- 
cable in  duplex.  Memory  copy  is  the  best  recovery  method,  but  using  an  NDRO  memory  (with 
rollback  or  rollahead)  does  the  best  job  of  reducing  the  effects  of  transient  faults. 

This  is  because  there  is  no  restore  cycle  in  an  NDRO  memory  and  it  will  not  suffer  tran- 
sient damage  during  a read  operation. 


FIGURES  COMPARISON  OF  REDUNDANCY  FIGURES  RECOVERABILITY  EFFECT  ON 

LEVELS  GPC  FAILURE  PROBABILITY 
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FIGURE  8 EFFECTIVENESS  OF  TRANSIENT  RECOVERY 
METHODS  OVER  A RANGE  OF  TRANSIENT- 
FAULT  ENVIRONMENTS 
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The  generation  of  analytic  models  is  a five-step  process.  The  steps  to  be  taken  to 
find  expressions  for  survivability  is  as  follows: 

a.  Characterize  transient  faults; 

b.  Characterize  the  effectiveness  of  fault  coverage; 

c.  Formulate  a fault  occur rence/recovery  status  state  diagram; 

d.  Form  equations; 

e.  Solve. 

The  methodology  will  be  illustrated  by  the  simple  example  of  a triple-modular-redundant 

■(TMR)  system.  After  the  initial  consideration  of  steps  a and  b above,  the  results  will 

hold  for  the  more  complex  configurations  that  will  follow. 

Transient  Fault  Characterization 

A transient  fault  is  a fault  that  disappears  some  time  after  its  arrival.  During  its 
stay  it  alters  the  contents  of  registers  and/or  memory  and/or  disrupts  the  normal  sequence 
of  program  execution.  We  recover  from  a transient  that  has  passed  by  restoring  altered 
data  and/or  program  and  by  bringing  the  recovering  computer  into  synchronization  with  the 
fault-free  computers.  We  will  characterize  transient  faults  by  their  arrival  and  their 
duration.  In  our  arrival  characterization,  we  make  the  assumption  that  transient  faults 
arrive  with  an  average  rate  t that  is  constant  during  a mission.  With  the  constant  rate 
over  time,  the  probability  of  the  arrival  of  a transient  fault  in  a small  interval  of 
time,  dt,  is  rdt.  It  is  well-known  that  under  these  conditions  (see  Refs.  1 and  2)  the 
probability  of  exactly  k transient  fault  arrivals  between  0 and  t obeys  a Poisson  pro- 
bability law.  That  is 


Pr  [k  arrivals  in  (0,t)]  = e 


If  we  let  k = 0,  we  have 

Pr  [ No  transients  in  (0,t)]  = 


(Tt)*" 

k: 


which  is  analogous  to  the  characterization  of  permanent  fault  and  leads  to  an  analogy 
between  K,  the  permanent  fault  rate  and  r.  We  assume  transient  faults  have  a definite 
duration.  There  is  a dilemma  concerning  the  probability  density  function  of  the  duration: 
We  could  be  of  the  opinion  that  short  transients  are  much  more  likely  than  long  transients 
which  would  lead  us  to  an  exponential  density  as  a mathematically  tractable  approximation. 
We  could  also  be  of  the  opinion  that  there  is  definite  mean  duration  with  an  associated 
spread  which  would  lead  us  to  the  gamma,  normal  or  Weibull  densities  as  an  approximation. 

We  could  also  say  that  transient  faults  are  caused  by  several  sources,  each  source  with  a 
different  average  duration.  But  there  are  more  sources  with  a small  duration  than  with  a 
large  duration.  In  this  case,  the  composite  density  function  of  all  the  durations  could 
be  a "lumpy"  exponential. 

Fault  Recovery  Process 

When  faults  occur  in  a fault-tolerant  computer  system,  certain  events  must  take  place 
for  the  eventual  recovery  and  resumption  of  computations . This  sequence  is  called  the 
fault-recovery  process.  After  fault  occurrence,  it  must  be  detected.  After  detection, 
the  fault  must  be  correctly  located  (diagnosis).  And  after  diagnosis,  an  appropriate 
recovery  action  must  take  place.  This  sequence  of  detection,  diagnosis,  and  recovery  is 
complicated  when  transient  fault  recovery  is  attempted.  The  sequence  of  events  that  occur 
is  shown  in  Figure  9.  The  fault  is  first  detected;  then  a recovery  delay  may  be  invoked 
to  allow  the  transient  phenomenon  to  subside.  (Transient  recovery  methods  need  no  diag- 
nostic procedure.)  After  the  delay,  a recovery  mode  is  entered.  First,  transient  recovery 
is  attempted.  If  it  is  successful,  recovery  is  complete  without  redundancy  degradation. 

If  it  fails  a permanent  fault  is  assumed,  and  another  level  of  diagnosis  and  recovery  is 
entered  upon. 

Detection,  diagnosis  and  recovery  each  have  two  parameters  associated  with  them:  A 

probability  of  success,  end  a time  to  completion.  Time  is  a simulator  function  that  is 
used,  in  conjunction  witii  other  parameters,  to  provide  the  success  probabilities  to  the 
models.  These  success  probabilities  are  detectability,  u;  diagnosabi lity , v;  and 
recoverability,  w;  and  are  defined  as: 

uA*  Pr  [Fault  is  detected  | fault  occurs]; 
vA  Pr  [Fault  is  located  | fault  detected]; 
w A Pr  [System  recovers  | fault  is  located]. 

The  product  of  u,  v,  and  w 

uvw  = c = Pr  [System  recovers  I fault  occurs] 


means  "equal  by  definition. 
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is  the  well-known  coverage  parameter  (see  Ref.  3).  Previous  models  include  this  parameter, 
but  do  not  break  it  into  its  component  parts.  These  parameters  will  be  given  subscripts 
in  the  sequel.  The  value  of  the  subscript  represents  the  number  of  fault-free  units 
remaining,  and  the  parameters' applicability  with  that  level  of  residual  (or  original) 
redundancy . 

State  Diagram  for  a TMR  Configuration 

We  are  now  prepared  to  analyze  a TMR  system.  We  begin  by  drawing  a fault  occurrence/ 
recovery  status  state  diagram.  This  diagram  represents  the  fault/recovery  status  of  the 
system.  Each  state  represents  the  number  of  fault-free  units  in  the  system  and  the  level 
of  fault  recovery  being  undertaken.  The  transitions  between  states  represent  the  occur- 
rence of  status-changing  events.  The  events  are  random  in  general  so  that  the  state  dia- 
gram is  probabilistic  in  nature.  Such  a state  diagram  is  illustrated  in  Figure  10.  From 
the  no-faults  state  a transient  fault  moves  us  into  the  transient  recovery  state.  From 
the  transient  recovery  state  one  of  three  things  may  occur: 


FIGURE  10  FAULT  RECOVERY  STATE  DIAGRAM  OF  A TMR  CONFIGURATION 


4. 
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a.  Successful  recovery — go  to  No-Faults; 

b.  Transient  mistaken  for  a permanent — go  to  One  Computer  Faulty; 

c.  Fault  in  a previously  fault-free  computer  during  recovery-- 

go  to  System  Failure. 

A permanent  fault  will  certainly  be  interpreted  by  the  recovery  procedure  to  be  permanent. 
Therefore,  for  analysis  purposes,  a permanent  fault  in  the  no-faults  state  moves  us  to  the 
one  computer  faulty  state  without  passing  through  the  transient  recovery  state.  If  a 
fault  occurs  in  one  of  the  two  remaining  computers  in  the  one  computer  faulty  state,  the 
system  fails.  In  TMR  without  transient  recovery,  the  first  fault  is  masked  by  the  voter. 
Therefore,  U3 , V3 , and  W3  are  1.  In  the  transient  recovery  TMR  (enhanced  TMR),  U3  and  V3 
are  1,  but  W3  is  not  quite  1 because  a fault  may  strike  down  a good  computer  during 
recovery.  However,  it  is  very  close  to  1 (on  the  order  of  1-10“*) . 

Formulation  of  Equations 

The  first  consideration  before  developing  equations  is  to  define  the  parameters  to  be 
used.  They  are  as  follows: 

X 4 Permanent  Fault  Rate; 

T 4 Transient  Fault  Rate; 

£ L Pr  [Transient  Fault  is  interpreted  as  a Permanent]; 

T 4 Mission  Time; 
a 4 X + ir  . 

The  quantity  / is  called  the  transient  leakage  and  is  the  probability  that  the  tran- 
sient recovery  procedure  fails  given  a transient  occurs.  We  also  need  the  recoverability 
papameter,  W3 , with  a redundancy  level  of  3.  The  quantity  a is  the  rate  of  permanents  and 
leaky  transients.  The  TMR  survivability  is  the  sum  of  two  mutually  exclusive  quantities. 
The  first  quantity  is  the  probability  there  are  no  permanents  or  unrecovered  transients 
during  the  mission.  The  second  is  the  probability  that  a permanent  or  leaky  transient 
occurs,  and  the  system  survives  the  remainder  of  the  mission.  Using  these  two  quantities, 
the  survivability  is  formulated  as: 

T 

-3(TT  r 

S (T)  = e + # Pr  [No  permanents  or  leaky  transients  «(0,t);  a permanent  or  a 

“U 

leaky  transient  that  is  recovered  at  t;  and  no  transients  or  permanents  in  the  two 
remaining  computers  <(t,t-KT)]  dt 

which  becomes 


and  has  a solution 


where  4 A + T. 


i 

I 


S(T)  = e 'dt 


, _ 3<tw, 

s(T)  = 3-3<^^  + 


-2<7^T  -3crT' 
e -e 


Modeling  a Duplex  Configuration 

The  duplex  configuration  is  a step  up  in  complexity  from  the  TMR  because  a determina- 
tion of  the  faulty  computer  is  required.  Fault  detection  is  accomplished  by  results  com- 
parison between  the  two  computers.  Transient  fault  recovery  is  accomplished  by  rollback. 
If  rollback  fails,  the  faulty  computer  must  be  diagnosed  by  self-test  methods.  After 
diagnosis  the  faulty  computer  must  not  interfere  with  the  operation  of  the  good  computer. 
Detectability  is  given  a value  of  1 while  vj  and  W2  lie  between  0 and  1.  In  simplex, 
transient  recovery  is  possible  by  using  built-in  test  hardware  to  detect  a fault  and 
rollback  to  correct  the  transient  error.  The  definition  of  the  following  parameters 
illustrate  further  the  subscript  method  of  specifying  current  redundancy. 


[2  A Pr  (Transient  mistaken  to  be  permanent  while  in  Duplex  | Transient  occurs) 

V2«2  4 Pr  (Successful  adaptation  to  Simplex  | Permanent  or  Leaky  Transient  occurs) 

4 Pr  (Transient  mistaken  to  be  permanent  while  in  Simplex  ] Transient  occurs) 

^2  A A + ^2"^ 

The  quantity  £2  is  the  transient  leakage  in  duplex.  The  quantity  V2W2  is  the  product  of 


A 
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the  diagnosability  V2  and  the  recoverability  W2.  The  quantity  02  is  the  average  rate  of 
occurrence  of  permanents  and  leaky  transients. 

The  fault  occurrence/recovery  state  diagram  of  our  duplex  configuration  is  shown  in 
Figure  11.  From  the  no- faults  state  a transient  will  send  us  the  rollback  state  where  a 
rollback  is  attempted.  It  is  successful  with  probability  l-f2-  rollback  is  not 

successful,  then  the  fault  is  taken  to  be  a permanent  from  where  diagnosis  and  recovery 
is  initiated.  If  the  fault  is  permanent,  then  it  is  taken  to  be  permanent  with  probability 
1.  In  diagnosis  and  recovery,  a recovery  to  simplex  is  achieved  with  probability  V2W2. 

In  simplex,  only  the  rollback  is  possible. 


FAULT  f \ 

SIMPLEX  I OCCUrTIrOLLBACK] 

'yrr\  j 


SYSTEM 

FAILURE 


FIGURE  11 


FAULT  OCCURRENCE/RECOVERY  STATUS  STATE 
DIAGRAM  FOR  A DUPLEX  CONFIGURATION 


The  survivability  equation  that  may  be  formulated  from  this  diagram  is  the  sum  of 
probabilities  of  two  mutually  exclusive  events:  The  system  has  no  permanents  or  leaky 

transients.  And  the  system  has  a diagnosed  and  recovered  permanent  or  leaky  transient 
and  survives  the  remainder  of  the  mission  in  simplex.  The  duplex  survivability,  S2,  then 


S2(T)  = + 


j 2^2^20^  Sj^(T-t)dt 


where  the  simplex  survivability  is  S]^(T)  = exp[-<T]^T].  The  solution  to  S2  is 


S2(TJ  = e'^'^2'^  + ^?^2°2  r 
2 2(72-<7^  L 


-<7j^T  -aa^T" 
e - e 


equation  for  S2  suggests  a recursive  definition  for  could  be  produced 
< re{  1 1,-ed  by  N ana  1 becomes  N-1. 

«ij,'er  ' onf  igurations 

«•  ( . f generalizing  from  the  duplex  configuration  was  suggested  in  the 

- ' The  analysis  results  in  a recursive  convolution  integral  equation  that 

* , tmi ined  solution.  This  generalization  would  be  applicable  to  all  con- 

• • .n.ler  io  successive  redundancy  degradations  from  N to  M computers  before 

• * ,*.  ) • »h")se  module  types  (such  as  computers)  that  a single  failure  dis- 


•e  ' ► 1 lure  12  shows  the  sequence  of  events  when  N redundant  computers 
at  time  T-0  in  the  N fault-free  modules  state  and  find 
. failing  as  a function  of  time.  Faults  occur  at  a rate 

• and  transient  fault  rates.  After  a fault  occurs,  we  move  to 
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FIGURE  12  FAULT  OCCURRENCE /RECOVERY  STATUS  STATE  DIAGRAM 


the  detection  state.  With  probability  Ujg , the  detectability,  the  fault  is  detected,  and 
we  move  to  the  transient  recovery  state.  Failure  to  detect  the  fault  is  assumed  to  pollute 
the  system  with  errors  resulting  in  a system  failure.  After  detection,  a transient  recov- 
ery is  attempted.  If  transient  recovery  is  successful,  the  module  set  is  restored  to  N 
working  units.  Transient  recovery  is  unsuccessful  if  the  fault  is  permanent  or  with  pro- 
bability /(j  (transient  leakage)  if  the  fault  is  transient.  The  unsuccessful  transient 
recovery  leads  to  a permanent  recovery  procedure  where  the  module  set  redundancy  is 
reduced  by  one.  Failure  of  permanent  recovery  results  in  system  failure. 

In  this  model  provision  for  non-unity  detectability  is  added.  This  results  in  a 
change  in  definition  of  to 


For  most  practical  values  of  Uj^  and  t,  this  will  not  have  a large  effect,  but  it  could 
have  significance  in  some  circumstances.  We  will  also  need  coverage,  Cj^  = Ujj  Vjj  w^g  in 
our  model  derivation.  The  survivability  equations  for  the  general  case  are  found  by  add- 
ing the  probabilities  of  two  mutually  exclusive  events,  as  before.  The  two  events  are: 

(a)  There  is  no  degradation  of  redundancy  during  the  mission:  and  (b)  there  is  a degra- 
dation- to  N-1  computers,  but  the  system  survives  the  remainder  of  the  mission.  The  sur- 
vivability equation  becomes 


J 

i 

4 


I 


< 

1 


.i 

I 


i 


Sjg(T) 


-N(7„T 
e + 


Nc, 


N N 


o 


e (T-t)dt 


whose  solution  is 


Sn<T>  = ,J^«Nk  ^ 


where 


•N-l,k 


K = 1 , 


N-1 

^NN  ""  '■  ' ,2^,"Nk 

K=1 


The  solution  is  derived  by  assumintj  the  sum  of  exponent  lu  In  and  substitu’inq  foj  f.,  , "l  ■ 

and  intey  ratin>j  . 

A Model  for  t he  Shutt  Display  System 

In  the  Shutt Le-orb iter  data  processing  system,  thf  operator's  data  entry  and  distliy 
system  is  known  as  the  multifunction  computer  display  system  iM.CDS)  . A special  model  was 
developed  for  the  MCDS  because  of  the  peculiar  intercoi;nections  of  its  redundant  component 
modules.  It  consists  of  the  display  electronics  unit  (DEUj,  display  unit  iDL’)  and  key- 
board (KB).  The  DU  is  dedicated  to  the  DDU,  so  we  consider  it  a part  of  the  DB'J  for  a.na- 
lysis  purposes.  There  are  two  KBs  connected  to  three  DE'ds  ’ey  a switching  ariangement. 

The  switching  allows  three  configurations  as  follows: 


1 . 

KB 

A 

• - DEU 

A 

KB 

B 

• DEU 

B 

2. 

KB 

A 

— — DEU 

A 

KB 

B 

•—  DEU 

C 

3. 

KB 

A 

— - DEU 

C 

KB 

B 

— DEU 

B 

This  connection  arrangement  is  illustrated  in  Figure  13.  The  fault  occurrence/ 
recovery  status  state  diagram  is  given  in  Figure  14.  The  detection,  diagnosis,  and 
recovery  states  are  omitted  in  the  diagram  for  clarity.  At  the  beginning  of  the  mission, 
the  MCDS  is  in  the  no-faults  state.  If  a keyboard  fails,  one  cf  the  DEUs  will  be  per- 
manently deprived  of  a keyboard.  The  mission  continues  with  a simplex  keyboard  and  duplex 
DEUs.  If  DEU  C fails,  then  KB  A will  be  dedicated  to  DEU  A,  and  KB  B will  be  dedicated  to 
DEU  B for  the  remainder  of  the  mission.  If  DECJ  A or  B fails  first,  then  one  KB  is  dedi- 
cated to  DEU  C while  the  other  may  be  connected  to  either  DEU  C or  B (we  assume  A was  the 
failed  DEU).  There  are  four  possibilities  for  the  next  failure:  (1)  If  the  dedicated  KB 

fails  then  the  common  KB  may  serve  the  remaining  DEUs.  ',4e  have  a simplex  keyboard  and  a 

duplex  DEU.  (2)  If  the  common  keyboard  fails,  then  DEU  B has  no  access  to  a KB.  We  com- 
plete the  mission  in  si.mplex.  (3)  If  DEU  C fails,  the  dedicated  KB  has  no  DEU  to  serve 

or  (4)  If  DEU  B fails,  then  we  complete  the  mission  with  duplex  KBs  and  simplex  DEU. 

We  need  go  no  farther  than  the  duplex  and  simplex  states  because  our  general  model  applies 
in  this  case.  The  resulting  equation  formulation  and  solution  are  quite  complex.  There 


FIGURE  13 


SYMBOLIC  INTERCONNECTION  DIAGRAM  OF  THE  MCDS 
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FIGURE  14  FAULT  OCCURRENCE/RECOVERY  STATUS  STATE 
DIAGRAM  FOR  THE  MCDS 


are  three  branches  from  the  no-faults  state,  so  that  there  are  four  mutually  exclusive 
events  to  be  considered.  The  resulting  solution  covers  a letter-sized  typewritten  page. 

Modeling  I/O  Controllers  Serving  Several  Devices 

The  Shuttle  data  processing  system  presented  several  instances  in  the  flight-critical 
bus  area  where  one  controller  serves  more  than  one  device.  As  an  illustrative  example, 
consider  triply  redundant  controllers  each  serving  two  devices  as  in  Figure  15.  The  pur- 
pose of  this  type  of  configuration  is  to  have  three  copies  each  of  devices  types  A and  B 
available  to  the  computer  set  wherein  no  two  failures  disable  access  to  the  devices. 

Faults  have  a different  impact  with  this  arrangement  depending  on  where  they  occur.  A 
fault  in  Cj^  disables  both  Aj^  and  but  a fault  in  does  not  disable  . The  device 
types  are  independent  in  pairs,  but  are  in  actuality  dependent  through  the  controllers. 

The  modeling  technique  used  in  the  previous  sections  results  in  mathematically  intractable 
formulations  when  applied  to  this  situation.  However,  a less  accurate  model  may  be  formu- 
lated that  was  checked  by  the  simulator.  There  are  two  extreme  approximations  possible 
with  the  previous  modeling  technique.  One  approach  involves  assuming  complete  unit  inde- 
pendence and  the  other  is  to  assume  total  unit  dependence.  These  represent  an  upper  and 
a lower  bound,  respectively,  to  the  true  survivability.  An  intermediate  solution  that 
provides  realistic,  usable  results  may  be  obtained  by  taking  each  of  the  mutually  exclu- 
sive cases  of  the  controller  failure  combinations  and  modeling  survivability  of  the  re- 
maining devices  given  that  failure  combination.  Each  possible  combination  that  can  result 
in  a successful  mission  is  modeled.  As  an  example  of  one  of  these  combinations,  suppose 
fails  and  Ct  and  don't  fail.  Then  devices  A and  B must  survive  in  duplex.  To 
formulate  the  equations  for  the  example,  we  recognize  that  there  are  three  controller-failed 
conditions  which  allow  the  system  to  survive:  none,  one,  or  two  failures.  There  are  three 

ways  to  have  one  or  two  failures  and  one  to  have  none.  The  survivability  then  becomes 

S(T)  = R^(T)  S3(T,A)  S3(T,B) 

+ 3cr2(T)  (1-Rj,(T))  S2(T,A)  S2(T,B) 

+ 3c2r^(T)  (1-R^(T))2  S3(T,A)  Si(T,B) 


3 


(.-IS 


CONTROLLERS  DEVICES 


FIGURE  15  DEPENDENT  I/O  DEVICE  EXAMPLE 
CONFIGURATION 


6.  SIMULATION 
Simulator  Design  Objectives 

Much  attention  has  been  given  to  improving  the  mission  success  probability  (MSP)  of 
computer  systems  by  the  addition  of  protective  redundancy.  Such  redundancy  allows  the 
system  to  continue  correct  operation  in  the  presence  of  one  or  more  failed  components. 

The  efficacy  of  this  improvement  is  measured  by  the  MSP  increase.  The  mission  success 
probability  is  defined  as  the  probability  that,  given  that  there  were  no  failed  components 
or  erroneous  memory  information  present  at  mission  inception,  the  hardware  and  software 
are  operating  correctly  at  the  end  of  the  mission.  Thus  the  system  must  be  able  to  sur- 
vive both  permanent  and  transient  faults. 

In  order  to  ma)ce  an  accurate  analytic  determination  of  the  MSP  of  this  type  of  system, 
all  fault-tolerance  processes  (e.g.,  detection,  recoveries,  etc.)  must  be  modeled.  How- 
ever, for  even  a reasonable  approximation  to  a real-world  implementation,  a mathematical 
model  soon  becomes  intractable.  Simulation  is  then  the  alternative  solution.  The  use  of 
simulation  studies  to  investigate  the  behavior  of  computer  hardware/sof tware  systems  is 
well-established.  Simulation  is  used  for  those  situations  which  are  intractable  to  an 
analytic  approach,  or  for  which  the  essence  is  lost  when  the  prerequisite  abstractions 
and  simplifying  assumptions  necessary  to  the  analytic  technique  are  made. 

The  goal  in  the  RCS  work  was  an  approach  that  is  applicable  to  a wide  variety  of  com- 
puter designs,  and  one  which  reflects  the  hardware/sof tware  interaction.  Thus,  a logic- 
level  simulation  would  provide  needless  detail,  in  addition  to  sacrificing  versatility. 

Hence,  a modeling  level  of  detail  was  chosen  that  permits  description  of  system  details, 
but  is  versatile  enough  to  acco/nmodate  different  computers  and  configurations.  Translat- 
ing these  ideas  into  RCS  simulation  objectives  yielded  the  following  three  items.  The 
simulator  should  produce:  (1)  the  fault  tolerance  of  each  of  a wide  variety  of  recon- 

figurable  computer  system  configurations;  (2)  configuration  performance  parameters  for 
use  in  analytic  modeling;  and  (3)  the  behavior  of  a configuration  in  various  fault 
environments.  The  requirements  imposed  on  the  simulator  design  by  these  three  objec- 
tives are  examined  in  the  following  paragraphs. 

The  simulator  should  be  able  to  produce  the  desired  measures  of  fault  tolerance 
for  a wide  variety  of  configurations.  This  requirement  can  be  satisfied  in  a reasonable 
way  by  structuring  the  simulator  such  that  the  various  fault-detection  and  recovery  algo- 
rithms are  implemented  as  subroutines.  Thus  a configuration  can  be  described  by  specify- 
ing the  applicable  set  of  subroutines,  plus  the  necessary  parameters.  This  simulator 
structure  provides  versatility  and  modularity,  and  minimizes  the  impact  of  addition  of 
new  subroutines. 

Configuration  performance  parameters  are  those  required  when  using  the  analytic  model 
for  analysis  of  a configuration.  For  example,  the  transient  leakage  in  triplex,  , has 
been  defined  as  the  conditional  probability  that  transient  recovery  fails,  given  that  a 
transient  has  occurred.  If  a configuration  is  analyzed  by  mathematical  modeling,  /j  is 
one  of  the  input  parameters  of  the  model.  However,  it  is  difficult  for  the  designer  to  H 

evaluate  , since  it  may  depend  on:  the  location  of  the  transient  fault;  their  occurrence  M 

rate  T;  tne  time  between  occurrence  and  detection  of  a fault;  and  the  recovery  algorithm  ■'9 

used.  By  introducing  these  factors  into  the  simulation  and  gathering  statistics  (iescrib-  H 

ing  the  computer  system  reaction  to  transient  faults,  can  be  estimated  by  computing  the  H 

— ----  ----  ■ - 
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ratio  of  the  number  of  unsuccessful  recoveries  from  transient  faults  to  the  total  number 
of  transients.  Thus,  for  the  configurations  where  the  mathematical  modeling  is  applicable, 
one  simulation  run  gives  an  estimate  of  these  parameters  of  the  modeling.  Then  using  the 
model,  the  MSP  of  the  configuration  can  be  easily  determined  for  any  given  time,  t. 


The  fault  environment  provided  in  the  simulator  should  be  sufficiently  versatile  to 
provide  all  expected  possibilities  to  test  the  recovery  algorithm  utilized  in  the  con- 
figuration under  simulation.  Thus  low  or  high  failure  rates,  existence  and  duration  of 
transient  bursts,  long  transients,  mathematical  fault-distribution  functions,  etc.,  must 
be  provided.  Implementation  of  this  fault  environment  should  be  accomplished  so  as  to 
provide  maximum  flexibility  of  environment  choice  by  the  user. 


Simulator  Overview 


The  simulator  program  consists  of  an  integrated  collection  of  FORTRAN  IV  computer 
programs  organized  to  simulate  the  detection  of  faults  in  a reconf igurable  computer  sys- 
tem and  the  computer  system's  successful/unsuccessful  recovery  actions  taken  in  response 
to  the  detected  faults.  A simulation  run  consists  of  several  phases.  First,  the  system 
is  initialized  by  obtaining  the  input  parameters  and  initializing  fault  counters.  Next 
the  system  simulation  begins.  Faults  are  randomly  generated  for  several  missions  and 
placed  in  a table.  The  fault  table  is  searched  to  determine  the  next  mission  in  which  a 
fault  occurs.  After  the  mission  parameters  are  initialized,  the  handling  of  faults  is 
simulated.  Then  the  statistics  for  the  mission  (i.e. , final  state,  number  of  faults, 
causes  of  failures,  etc.)  are  gathered.  This  process  is  repeated  until  all  missions  are 
simulated,  and  then  estimates  for  analytic  model  parameters  are  calculated  and  printed 
along  with  the  simulator  statistics.  A summary  of  the  simulator  utilization  is  shown  in 
Figure  16. 
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FIGURE  16  SIMULATOR  UTILIZATION  SUMMARY 
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A variety  of  configurations  can  be  specified  by  the  input  parameters.  These  configu- 
rations differ  in:  (1)  the  degree  of  hardware  redundancy  and  the  way  this  redundancy  is 

utilized;  (2)  the  methods  employed  for  detection  and  isolation  of  faults  and  their  effec- 
tiveness; (3)  the  recovery  procedures  that  are  used  and  their  performance  characteristics: 
and  (4)  the  software  scheduling  mechanism  and  parameters.  The  simulated  computer  system 
may  contain  up  to  five  computers  configured  as  an  adaptive  quintuplex  system  that  can 
reconfigure  only  to  a duplex  system,  or  all  the  way  down  to  a simplex  system.  The  effects 
of  a variety  of  reliability  enhancement  techniques  can  be  incorporated  into  the  configura- 
tion by  ad]usting  the  values  of  several  input  parameters.  For  example,  the  use  of  an  MDRO 
memory  is  taken  into  account  by  adjusting  the  program  integrity  parameter,  and  memory 
parity  is  accounted  for  with  the  memory  BITE  ef fecti veness  parameter.  The  simulator 
accounts  for  system  recovery  from  transient  faults  as  well  as  permanent  faults.  Currently 
the  rollahead,  rollback,  memory  copy,  and  "wait  and  see  what  happens"  transient  recovery 
techniques  can  be  specified  for  a configuration.  Transient  recovery  is  specified  in  the 
input  as  (1)  the  recovery  techniques  used,  (2)  the  recovery  duration,  (3)  the  effectiveness 
of  the  recovery  technique  for  each  class  of  fault,  and  (4)  the  recovery  recurrence  interval. 
The  simulator  also  takes  into  account  global  software  characteristics  such  as:  the  type  of 
scheduling  mechanism,  the  minor  and  major  cycle  durations,  and  the  relative  minor-cycle- 
program  size.  1 

The  fault  environment  is  defined  in  terms  of  probability  distribution  functions  for 
random  variables  representing  the  fault-occurrence  time,  the  fault  duration,  and  the  fault 
location.  Permanent  faults  are  assumed  to  have  interarrival  times  which  are  exponentially 
distributed,  and  infinite  fault  durations.  Here  it  is  assumed  that  permanent  faults  occur 
independently  of  each  other  and  are  raemoryless  (the  probability  ttiat  a fault  occurs  between 
the  times,  T,  and  T+dT,  is  independent  of  T) . Transient-fault  inttrarrival  times  may  be 
either  exponentially  distributed  or  burst-distributed.  The  burst  distribution  assumes 
that  transient  faults  have  a tendency  to  arrive  in  groups.  The  random  variable  represent- 
ing the  transient-fault  duration  may  be  uniformly  distrit'uted  or  exponentially  distributed; 
other  distributions  for  fault  occurrence  times  and  durations  can  be  defined  by  the  simulator 
user  with  only  minor  modifications. 

After  all  missions  have  been  simulated,  the  accumulated  system  statistics  are  displayed 
and  used  to  estimate  system  performance  measures.  The  mission  failure  probability  is  es- 
timated by  finding  the  ratio  of  the  number  of  system  failures  to  the  number  of  missions 
simulated.  Global  parameters  required  by  the  analytic  model  are  estimated  from  the  ratios 
of  other  simulation  statistics.  For  example,  C2  which  is  the  probability  that  the  system 
recovers  given  that  a fault  occurs,  is  required  by  the  analytic  model.  It  is  estimated 
by  the  ratio  of  the  number  of  successful  transitions  from  duplex  to  simplex  to  the  total 
number  of  attempted  recoveries  in  duplex.  The  reliability  of  a parameter  estimate  is 
dependent  upon  the  number  of  samples  used.  The  simulator  calculates  this  confidence 
interval  and  prints  it  out  along  with  the  estimated  parameter. 

Simulator  Model  Formulation 


The  approach  taken  in  the  formulation  of  the  simulator  is  an  extension  of  the  approach 
described  in  Ref.  4.  Formulating  the  simulator  in  this  way  permits  the  computer  system  to 
be  viewed  as  a finite  state  automaton.  Thus,  the  system  is  described  by  the  states  it 
may  assume  and  the  possible  transitions  between  states. 

The  computer  system  states  are  defined  by  two  conditions.  The  first  of  these  is  the 
function  being  performed  by  the  system.  Examples  of  these  are: 

a.  .Mormal  Operation; 

b.  Recovery  Operation; 

c.  Reduced  Capability  Operation; 

d.  System  Restart;  and 

e.  System  Failure. 

The  second  of  the  system-state  defining  conditions  is  that  of  the  number  of  permanent 
faults  that  the  simulated  system  has  suffered  during  the  particular  simulated  mission 
under  consideration.  Obviously,  the  system  that  has  not  yet  encountered  a fault  will  be 
in  normal  operation,  while  a system  that  has  encountered  faults  may  be  in  recovery 
operations,  reduced  capability  operations,  system  restart,  or  may  have  failed. 

Transitions  between  states  in  the  simulated  computer  system  will  be  caused  by  either 
of  two  events.  The  first  event  that  may  cause  a transition  is  the  detection  of  a fault. 
For  example,  the  first  detection  of  a fault  in  the  Shuttle  GPC  set  causes  a transition 
to  the  delay-reconf igurable  state  which  simulates  the  FCOS  transient-recovery  method. 

Later  detections  of  faults  will  cause  a state  transition  in  the  simulated  system.  The 
second  event,  the  completion  of  a recovery  procedure,  will  definitely  cause  a transition 
to  another  state.  What  state  is  the  destination  of  this  transition  depends  on  the  type 
of  recovery  procedure  attempted.  For  example,  the  successful  completion  of  a normal 
recovery  procedure  when  four  GPCs  are  operating  will  return  the  simulator  to  the  transient 
operations  state.  However,  a recovery  procedure  that  requires  deactivation  of  one  of 
three  GPCs  will  cause  the  simulated  system  to  degrade  to  the  duplex  state. 

An  important  aspect  to  be  noted  when  considering  the  organization  of  the  RCS  simulator 
is  that  it  is  an  "event-driven"  simulation.  Thus,  the  initial  state  transition  is  only 
made  when  an  event,  in  this  case  either  a permanent  or  transient  fault,  occurs.  Use  of 
this  type  of  structure  provides  a significant  saving  in  computer  time. 
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Figure  17  presents  the  simplified  state  diagram  of  an  adaptive  NMR  configuration  that 
employs  rollahead,  rollback,  and  memory  copy  for  transient-fault  recovery.  A count  of 
the  number  of  active  computers  is  maintained  so  that  states  I,  II,  III,  and  VII  do  not 
need  to  be  duplicated  for  the  operation  of  more  than  three  computers.  In  the  Normal 
Operation  state  with  three  or  more  computer  units,  the  outputs  of  the  computers  are 
periodically  compared.  Disagreement  of  one  or  more  computers  constitutes  fault  detection 
and  requires  exit  from  this  state.  As  long  as  two  computers  are  fault-free,  the  rolla- 
head recovery  procedure  is  used  and,  if  it  is  not  successful,  the  memory  copy.  If  all 
computers  disagree  at  the  same  time,  a system  restart  is  initiated. 


The  Rollahead  state  is  entered  to  simulate  the  computer  system's  attempt  to  recover 
from  a detected  single  fault.  The  state  vector  (consisting  of  program  variables  and  all 
register  contents)  of  one  good  computer  is  used  to  replace  the  non-agreeing  computer's 
state  vector.  However,  all  transient  failures  are  not  corrected  by  this  procedure  since 
a bad  instruction  cannot  be  restored.  The  approach  taken  in  the  simulation  is  to  provide 
for  the  specification  of  a rollahead  success  probability.  This  probability  can  be  for- 
mally defined  as: 


Psuc  “ [Fault  is  corrected  given  that  a fault  has  occurred,  has  been 
detected,  and  its  physical  cause  has  disappeared  when  cor- 
rection begins] 


An  analysis,  which  gives  consideration  to  the  type  of  memory  (e.g.,  2 1/2D,  3D,  DRO, 

NDRO,  etc.)  and  the  consequences  of  memory  faults,  will  yield  an  estimate  of  the  rolla- 
head success  probability  (or  program  integrity) . 

The  Memory  Copy  recovery  procedure  is  entered  after  a specified  number  of  rollaheads 
have  been  completed  unsuccessfully.  The  memory  contents  of  one  good  memory  are  trans- 
ferred into  the  faulty  memory.  In  order  to  avoid  interruption  of  computation,  the 
transfer  is  effected  on  the  basis  of  cycle  stealing.  It  ends  with  the  updating  of  the 
state  vector  of  the  faulty  computer.  Since,  during  a memory  copy,  normal  application 
routines  continue,  it  is  possible  that  a new  fault  shows  up.  The  following  (conservative) 
assumption  has  been  made  in  order  to  simplify  the  simulation.  Upon  detection  of  a second 
fault  during  a memory  copy,  the  memory  copy  procedure  is  abandoned  and  the  computer  for 
which  this  memory  copy  was  intended  is  discarded.  It  is  assumed  that  memory  copy  prov’ides 
recovery  from  transient  faults  which  have  disappeared  when  the  memory  copy  began  with  a 
probability  equal  to  the  memory  copy  efficacy. 

The  System  Restart  state  is  entered  when  all  computers  disagree  upon  comparison.  The 
recovery  procedure  from  this  state  may  consist  of  a memory  verification.  Relevant  memory 
locations  are  read,  voted  upon,  and  restored.  Extensive  diagnosis  may  also  be  run. 

Finally,  if  a backup  memory  is  available,  reloading  may  be  possible.  Then  the  application 
program  is  reinitiated  from  the  restart  point.  After  a successful  system  restart,  the 
system  returns  to  the  normal  operation  state.  However,  since  all  computers  stop  their 
normal  computation  during  a system  restart,  this  recovery  procedure  is  time-critical. 

Note  that  in  a benign  fault  environment,  the  probability  of  having  a system  restart  is 
quite  small  (==1  for  1 million  faults)  . However,  system  restart  is  necessary  if  the  fault 
environment  is  so  harsh  that  bursts  of  faults  can  hit  several  computers  at  a time  or  if 
the  probability  of  a short  power  failure  is  not  negligible. 

If  a spare  is  available,  it  should  be  activated  once  a permanent  fault  has  been 
recognized.  As  part  of  the  activation  process,  the  spare  is  checked  and  conditioned 
by  one  of  the  good  computers.  In  the  situation  depicted  in  the  state  diagram  of  Figure  17, 
spares  are  not  available  for  the  duplex  and  simplex  simulation.  This  is  thought  to  be 
compatible  with  the  expected  applications. 

The  Duplex  operation  state  is  entered  upon  the  determination  that  a permanent  fault 
exists  in  one  of  the  three  computers  in  the  computer  system.  This  state  is  quite  similar 
to  the  normal  operation  (N  units)  state,  except  that  the  only  available  recovery  procedure 
is  program  rollback. 


The  Rollback  state  is  entered  upon  the  detection  of  a fault  when  the  computer  system 
is  in  the  Duplex  operation  state.  Rollback  is  the  term  used  to  describe  repetition  of 
the  program  segment  executed  just  prior  to  the  detected  output  disagreement.  The  state 
vector  at  the  beginning  of  each  program  segment  is  maintained  in  order  that  the  rollback 
procedure  may  be  accomplished.  After  the  program  segment  has  been  repeated,  the  outputs 
of  the  two  computers  are  compared;  if  the  correction  is  successful,  the  computer  system 
switches  back  to  the  Duplex  operation  state.  If  the  output  differs,  the  system  rolls 
back  again:  this  unsuccessful  recovery  process  continues  a predetermined  number  of  times 
before  changing  the  computer  system  state  to  diagnosis.  Since  both  of  the  active  computers 
remaining  in  the  computer  system  must  stop  their  normal  computations  during  a rollback, 
this  computer  recovery  procedure  may  be  time-critical.  However,  if  comparisons  are  fre- 
quent enough,  a rollback  should  not  last  more  than  a few  milliseconds. 


A disagreement  upon  comparison  in  duplex  does  not  indicate  which  of  the  computers 
produced  the  wrong  value.  Thus  the  Diagnosis  state  must  be  entered.  To  accomplish 
diagnosis,  self-tests  are  run.  If  they  are  successful,  the  faulty  computer  is  isolated 
and  the  system  switches  to  simplex.  If  unsuccessf ul , the  system  is  unable  to  decide 
which  computer  is  faulty  and  the  system  fails.  Diagnosis  programs  are  obviously  time- 
critical.  Note  that  it  would  be  possible  to  include  a memory  copy  which  would  take  place 


once  a diagnosis  had  been  successful:  the  memory  of  the  good  computer  would  be  copied 
into  the  bad  one.  However,  this  improvement  is  not  as  good  as  it  would  seem  since  many 
transients  cannot  be  detected  through  diagnosis. 

In  Simplex  operation,  comparison  is  no  longer  available  for  detection  of  faults.  We 
must  rely  mostly  on  the  BITE  to  detect  faults.  CPU  transients  are  difficult  to  detect. 
Some  may  be  caught  through  go/no-go  counters  and  storage  protection.  Memory  faults  are 
easier  to  detect.  Parity  check  is  especially  useful.  When  a fault  is  detected,  a roll- 
back is  initiated.  If  the  fault  is  not  detected,  a failure  occurs. 

Rollback  in  Simplex  is  the  same  procedure  used  in  duplex.  Since  it  is  the  only 
recovery  algorithm  available  in  simplex,  it  is  repeated  as  long  as  it  is  not  successful. 
If  recovery  from  the  fault  cannot  be  effected,  a system  failure  will  occur  when  the  sys- 
tem has  been  down  too  long . 

The  System  Failure  state  is  entered  when  the  system  is  unable  to  run  properly  and 
longer  or  when  computational  require.ments  have  not  been  met  for  too  long  a period  of 
time.  Upon  recognition  of  the  condition  of  a system  failure,  the  Driver  program  dis- 
continues the  simulation  of  a mission. 

Causes  of  failures  are: 

a.  Excessive  time  in  rollahead,  memory  copy,  or  rollback:  This  should  not 

happen  since  the  system  must  be  designed  so  that  a recovery  procedure 
does  not  endanger  it.  However,  it  might  happen  that  the  continuous 
repetition  of  such  procedures  could  be  fatal  for  the  successful  com- 
pletion of  the  mission. 

b.  An  overly  long  system  restart:  A system  restart  is  a very  rarely  called 

procedure.  But  it  is  long  (a  few  seconds),  and  may  not  always  be 

tolerable . 

c.  Diagnosis  incomplete  when  available  recovery  time  expires:  Normally, 

diagnosis  follows  rollback.  It  is  possible  that  these  two  recovery 
procedures  sometimes  take  too  long. 

d.  Undetected  faults  in  simplex. 

e.  A too  long  rollback  in  simplex:  This  happens  when  a permanent  occurs 

or  when  a non-recoverable  transient  occurs. 

f.  lOP  failures:  In  the  case  of  non-dedicated  EEMs , the  system  fails 

when  all  lOPs  fail  or  when  all  but  one  fail  and  the  computers  are 

unable  to  decide  which  is  the  good  EEM. 

g.  Bus  failures:  The  system  fails  when  all  buses  fail  or  when  all  but 

one  fail  and  the  computers  are  unable  to  decide  which  is  the  good  bus. 

h.  Actuator/sensor  failures. 
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Simulator  Inputs 

The  inputs  required  by  the  simulator  are  summarized  in  Table  I.  The  use  of  some  of 
these  inputs  is  discussed  below.  The  detection  probabilities  are  the  probabilities  that 
a computer  detects  its  own  faults  (except  through  diagnosis) . This  is  not  significant 
for  NMR  configurations  (N  = 3)  since  all  faults  are  detected  and  located  through  voting 
or  comparison.  However,  these  probabilities  become  critical  in  duplex  and  simplex.  In 
duplex,  faults  are  detected  through  comparisons.  However,  BITE  or  self-test  is  needed  to 
isolate  the  faulty  computer.  In  simplex,  BITE  is  necessary,  since  it  provides  the  only 
means  for  detecting  transient  faults.  For  simplex  operation  the  detection  probability  of 
CPU  faults  is  low.  Faults  in  the  CPU  usually  cause  only  a wrong  output  which  will  not  be 
detected  by  BITE.  However,  some  will  be  detected.  Those  are  the  ones  which  cause  a for- 
bidden address  to  be  computed  or  those  which  modify  the  computing  sequence  in  such  a man- 
ner that  a go/no-go  counter  detects  them.  IBM  estimates  this  detection  probability  to  be 
about  35%.  The  main  technique  to  detect  a memory  fault  is  parity  encoding.  When  it 
exists,  the  probability  of  detecting  a memory  fault  is  usually  better  than  80%.  When  it 
does  not  exist,  this  probability  is  quite  small.  Self-test  programs  (diagnosis)  are  run 
in  a duplex  system  where  a fault  has  been  detected  but  not  isolated.  Note  that  if  the 
fault  is  transient,  the  self-test  will  probably  not  diagnose  it,  since  it  usually  dissi- 
pates before  the  test  is  run. 

If  the  configuration  includes  some  additional  hardware  for  the  Input/Output  Processor, 
the  consequence  of  faults  in  this  hardware  has  to  be  assessed.  We  partitioned  the  con- 
figurations in  two  classes.  In  the  first  class  (dedicated  lOPs) , we  assume  that  a fault 
in  the  lOP  is  equivalent  to  a fault  in  the  computer  and  sometimes  on  the  corresponding 
bus.  In  the  second  one  (non-dedicated  lOPs),  we  assume  that  lOPs  are  independent  from 
the  computers.  The  system  can  worlt  as  long  as  one  computer  and  one  lOP  are  good.  Note 
that  the  dedicated  case  includes  software  TMR. 

In  the  present  simulator,  the  recovery  procedure  for  an  NMR  system  is  the  state  vector 
transfer.  Memory  copy  is  optional.  Once  a recovery  procedure  has  failed  for  a certain 
fault,  it  is  useless  to  attempt  to  recover  through  the  same  procedure.  Some  other  one  has 
to  be  chosen.  If  after  completion  of  a recovery  procedure  a fault  recurs  in  the  same  com- 
puter after  a time  less  than  the  unacceptable  recurrence  interval,  the  system  decides  that 
the  recovery  procedure  was  unsuccessful  and  attempts  something  else.  Usually,  the 


TABLE  I REQUIRED  SIMULATOR  INPUTS  - GPC  PARTITION 


NUMBER  OF  SIMULATED  MISSIONS 
MISSION-DEPENDENT  PARAMETER 
Mission  Time 

MACHINE-DEPENDENT  PARAMETERS 
Permanent  Failure  Rates 

BITE  Detection  Probability  of  a CPU  Fault 
BITE  Detection  Probability  of  a Memory  Fault 
Self-Test  Program  Efficiency 
Self-Test  Program  Duration 

CONFIGURATION-DEPENDENT  PARAMETERS 

Number  of  Computers 
Number  of  Spares 

Dedicated/Non-Dedicated  lOPs  (Input/Output  Processor) 

Probability  that  an  lOP  Fault  Hits  the  Bus 

Number  of  Non-Dedicated  lOPs 

Applicable  Recovery  Algorithms 

Recovery  Algorithm  Characteristics 

Duration 

Unacceptable  Recurrence  Interval 
Maximum  Number  of  Rollbacks 
Program  Integrity 
Memory  Copy  Efficacy 

SCHEDULING  PARAMETERS 

Iteration  Period 

Time  Between  Comparisons 

Major  and  Minor  Cycle  Durations 

Asynchronous /Synchronous  Mechanism 

ENVIRONMENT-DEPENDENT  PARAMETERS 


Transient  Failure  Rates 
Transient  Failure  Duration 
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recurrence  intervals  will  be  chosen  equal  to  the  duiaticn  oi  one  .-’.lajor  cycle.  I'h-.  r ;tiu- 
nale  is  that  the  memory  is  thoroughly  exercised  in  one  ma  ior  cycle.  Th--  me;'  ty-.-oiy 
■ Mcy  is  the  probability  that  a memory  copy  corrects  a trmsien'.  tjait.  The  only  reason 
why  it  would  not  succeed  is  that  the  transient  might  have  hit  the  raicror/rogram.  init.iating 
the  memory  copy.  This  is  very  unlikely  since  this  program  resides  in  a read-on ly-memcry 
or  microstore.  The  rollahead  efficacy  and  the  rollback  eificicy  for  inults  that  do  net 
cause  program  memory  damage  must  also  be  specified,  but  they  are  generally  assumed  to  be 
one.  The  program  integrity  is  listed  with  the  other  rec  A-ery  algorithm  characteristics 
because  a transient  recovery  algorithm  not  involving  memory  refresh  ca;  not  succeed  when 
there  is  a program  memory  damage.  Program  integrity  is  strongly  linked  to  the  ' ype  of 
memory:  an  NDRO  memory  is  much  better  in  this  respect  than  a DRO  .memory.  The  f.ict  that 
there  is  no  need  to  restore  the  information  makes  it  very  unlikely  that  a tran.sient  fault 
damages  instructions  or  constants.  In  addition,  in  most  h'D.RO  appl  icat  ions , the  write 
voltage  for  the  program  memory  is  disabled  except  when  altering  the  program  under  AGE 
control . 

An  important  point  in  the  application  of  CAST  to  the  Shuttle  data  proce.ssing  subsys- 
tem is  the  determination  of  simulator  input  parameters.  There  are  several  methods  for 
obtaining  them  if  their  values  are  not  obvious:  Failure  rates  and  built-in  test  detection 

probabili  ties  ai  e usually  obtained  from  the  manufacturer.  Parameters  afrectir.g  transient, 
fault  recovery  such  as  the  Program  Integrity  or  transient  leakages  can  be  cetermi:ied  by 
engineering  analysis  or  by  logic  level  simulation.  Parameters  that  couldn't  be  obtained 
from  the  manufacturers  were  estimated  by  an  engineering  analysis.  One  of  the  required 
simulator  inputs  is  called  program  integrity  (PI).  This  simulator  input  is  the  probability 
that  a transient  fault  in  the  GPC  memory  does  not  alter  a program  word. 

We  use  a "top-down"  approach  by  subdividing  the  GPC  memory  into  functional  components 
and  then  in  turn  further  partitioning  these  functional  components.  For  each  transient 
failure  mode  within  a component  we  determine  whether  memory  will:  always  be  corrupted; 
be  corrupted  only  if  the  component  is  used;  or  never  be  corrupted.  The  expression  for 
the  program  integrity  can  be  written  as  one  minus  the  probability  that  a transie.nt  fault 
alters  a program  word.  Thus  PI  is  written  as 


PI  = 1 - 


where:  is  the  rate  of  occurrence  of  transient  failure  mode  j in  component  i, 

IS  the  probability  that  transient  failure  mode  j in  component  i corrupts 
memory , and 

n^  is  the  number  of  components  of  type  i. 

The  first  partitioning  of  a 16K  - 2 1/2D  core  memory  as  found  in  the  IBM-4  AP-101 
basic  configuration  is  shown  in  Figure  18.  This  partitioning  d’vides  the  memory  into  the 
timing  page  and  four  storage  pages. 
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FIGURE  18  PARTITIONING  THE  AP-101  MEMORY 
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Further  partitioning  continues  as  shown  in  Figure  19  for  a storage  page.  We  see  from  this 
partitioning  that  a transient  in  the  output  buffer  will  only  corrupt  the  memory  output, 
but  a transient  in  the  data  register  would  surely  corrupt  memory  during  the  restore  cycle 
as  well  as  the  memory  output. 

Consider  the  case  of  a Y-driver  as  shown  in  Figure  20.  If  a transient  strikes  a 
powered  Y-driver,  then  any  Y-driver  failure  mode  will  corrupt  memory  during  the  read  and/ 
or  restore  cycle.  The  quantity  for  a Y-driver  then  becomes  the  probability  that  it 

is  selected  while  a transient  is  active.  The  Y-driver  on  the  page  has  a 1/32  probability  || 

of  being  used,  and  for  a 16K  memory,  the  page  of  the  driver  of  interest  has  a 50  percent 
probability  of  being  used.  If  we  assume  program  words  are  accessed  every  3 fis , then  the 
quantity  for  one  Y-driver  becomes 


^ P(Td=3n  us) 

n=l 


where  Td  is  a discrete  random  variable  representing  transient  duration.  If  we  assume  it 
is  uniform  from  3 to  300  fjs  at  intervals  of  3 /is  for  ease  of  computation,  then 
becomes 


. 100 

1 ^ ,63' n 

100  '64' 

n=l 


.57 


Computing  the  s as  above  for  the  remaining  functional  components  and  finding  the  yS^j's 
as  is  done  for  permanent  faults,  program  integrity  is  found  to  be  .30. 
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FIGURE  19  PARTITIONING  OF  THE  AP-101  MEMORY  STORAGE-PAGE 


FIGURE  20  Y-DRIVER  ANALYSIS 
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Inpuu/Output  Subsystem  Simulation 

III  general,  the  input/output  network  is  not  specified  by  the  simulator's  input  deck 
because  the  configuration  of  the  1,^0  devices  associated  with  a data  processing  system  is 
very  application-dependent.  For  a given  application,  a set  of  routines  must  be  program- 
med to  simulate  the  system's  handling  of  faults  occurring  in  the  I/O  equipment  group. 

These  routines  are  invoked  upon  occurrence  of  an  I/O  fault  to  simulate  the  entire  fault- 
recovery  process  (i.e.,  fault  detection,  faulty  device  isolation,  and  system  recovery) 
and  return  information  concerning  the  impact  of  the  fault  on  the  system  to  the  calling 
program.  After  they  are  programmed,  the  I/O  simulation  routines  must  be  interfaced  with 
the  RCS  simulator  program.  Simulation  routines  were  developed  for  the  flight-critical 
bus  subsystem  of  the  ALT  space  Shuttle  and  the  methodology  employed  could  be  used  for 
other  I/O  subsystems. 

Figure  21  shows  the  layout  of  the  flight-critical  bus  (FOB)  equipment  group  of  the 
space  Shuttle  (ALT).  The  eight  flight-critical  buses,  FCl  - FC8,  are  interfaced  with  all 
GPCs  (computers) . Each  dedicated  display  unit  (DDU)  is  interfaced  with  three  buses  by 
means  of  three  redundant  ports.  Each  flight-forward  Multiplexer-Demultiplexer  (MDM)  is 
interfaced  with  two  buses  by  means  of  a primary  port  and  a secondary  port.  If  the  elec- 
tronics associated  with  a primary  port  fails,  the  backup  port  is  switched  in.  Each  inter- 
face unit  (MDM  or  DDU)  controls  several  dedicated  and/or  non-dedicated  devices  (non- 
dedicated  devices  are  shaded  and  can  be  accessed  through  more  than  one  MDM) . These 
devices  are  redundant  (e.g.,  ACCELl,  ACCEL2,  and  ACCEL3  perform  identical  functions),  thus 
one  of  them  can  fail  without  causing  a system  failure. 
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FIGURE  21  FLIGHT-CRITICAL  BUS  CONNECTIONS  (ALT) 


It  was  impractical  to  use  the  same  method  for  modeling  the  f light-critica i bus  equip- 
ment group  as  was  used  to  model  the  computers  because  the  FCB  equipment  group's  complexity 
would  have  resulted  in  a multitude  of  states.  The  behavior  of  the  I/O  equipment  group  is 
represented  by  a set  of  tables  and  some  procedures.  The  tables  define  the  current  state 
of  the  system,  i.e.,  the  device  redundancy,  the  device  interconnections  and  the  device 
status.  The  procedures  define  the  fault-induced  system  action,  the  resulting  table  modi- 
fications (i.e.,  state  transition)  and  the  successfulness  of  recovery.  Doth  the  built-in 
test  equipment  and  the  redundancy  management  software  are  factored  into  the  implementation 


of  these  procedures,  since  they  determine  the  fault  detection,  isolation  and  recovery 
success  probabilities. 
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For  example,  the  interface  between  the  flight-critical  buses  and  the  Interface  Units 
(lU)  is  reflected  by  Figure  22.  Each  row  corresponds  to  a flight-critical  bus  and  each 
column  corresponds  to  an  lU.  An  element  that  is  indexed  by  a particular  bus  and  lU  (row 
and  column)  is  assigned  to  a number  according  to  the  following  scheme: 

0 - The  bus  does  not  have  a functional  interface  with  the  lU. 

1 - The  bus  has  an  active  interface  with  the  lU. 

2 - The  bus  has  a functional,  but  inactive,  interface  with  the  lU  (i.e.,  this 

represents  a secondary  port) . 

Thus  from  Figure  22,  it  can  be  inferred  that  MDM  FFl  is  interfaced  with  flight-critical 
buses  FCl  and  FC5.  FCl  is  connected  to  the  primary  (active)  port  of  MDM  FFl,  and  FC5 
is  connected  to  the  secondary  port.  Note  that  each  UDU  has  three  active  ports.  Here  it 

is  assumed  that  display  information  is  transmitted  on  buses  FCl  - FC4,  and  the  actual  bus 

used  by  a DDU  is  selected  by  a manual  switch  on  its  control  panel.  The  interface  between 
the  interface  units  and  their  associated  devices  is  represented  by  additional  tables. 
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FIGURE  22  BUS  - lU  INTERCONNECTION  MATRIX 
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software  RELIABILITY:  ANALYSIS  AND  PREDICTION 


Martin  L.  Shooman* 

Department  of  Electrical  Engineering 
Division  of  Computer  Science 
Polytechnic  Institute  of  New  York 
Brooklyn,  New  York  11201 

SUMMARY 

With  the  advent  of  large  sophisticated  hardware-software  systems  developed  in  the  1960*s,  the 
problem  of  computer  system  reliability  has  emerged.  The  reliability  of  computer  hardware  can  be 
modeled  in  much  the  same  way  as  other  devices  using  conventional  reliability  theory;  however,  computer 
software  errors  require  a different  approach. 

The  paper  begins  by  describing  the  types  and  causes  of  software  errors  and  provides  working 
definitions  of  software  errors  and  software  reliability.  Some  of  the  basic  data  on  frequency  of  occur- 
rence of  errors  is  then  discussed.  The  paper  then  summarizes  and  references  some  of  the  software 
reliability  models  which  have  been  proposed  and  concentrates  on  one  developed  by  the  author. 

This  newly  developed  probabilistic  model  predicts  reliability  based  on  the  initial  number  of  errors 
in  a program,  the  number  removed,  and  the  number  remaining  in  the  program.  The  model  constants  are 
calculated  from  operational  test  data  taken  on  the  software  performance. 

The  calculations  result  in  a decreasing  probability  of  no  software  errors  versus  operating  time 
(reliability  function).  The  rate  at  which  the  reliability  decreases  is  a function  of  the  man-months  of  de- 
bugging time.  Similarly,  the  mean  time  to  occurrence  of  operational  software  errors  (MTTF)  is  obtained. 
The  MTTF  increases  slowly  and  then  more  rapidly  as  the  debugging  effort  (man-months)  increases.  The 
model  permits  estimation  of  software  reliability  before  any  code  is  written  and  allows  later  updating  to 
improve  the  accuracy  of  the  parameters  when  integration  or  operational  tests  begin. 


I 1.0  INTRODUCTION 

[ 1.1  The  Age  of  Large  Computers 

The  first  question  that  one  hears  when  the  term  software  reliability  is  mentioned  in  discussion  is, 
what  is  that?  As  the  digital  computer  continues  to  pervade  more  and  more  of  our  modern  technology,  we 
rely  on  its  output  more  and  more  for  control,  data  recording,  analysis,  and  decision  making.  Thus,  the 
size  and  complexity  of  the  required  tasks,  the  computer  hardware,  and  the  computer  software  has  dras- 
tically increased  in  the  last  three  decades.  With  such  huge  size  and  complexity,  it  is  virtually  impossible 
to  definitively  specify  the  problem  (without  error),  to  make  failure  free  computer  hardware,  and  to  re- 
move all  errors  from  the  software.  The  engineer  must  pose  and  help  to  answer  the  following  problems: 

[ How  often  can  the  system  fail  in  use  and  still  be  considered  acceptable?  What  percentage  of  these  fail- 

[ ures  is  miswritten  or  misinterpreted  specifications?  What  percentage  is  hardware  failures?  Lastly,  the 

• subject  of  this  paper,  what  percentage  is  due  to  software  errors? 

1 During  the  decade  of  the  1940s  coniputers  were  born.  In  the  1950s  commercial  hardware  and  as- 

sembly programs  became  available.  In  the  1960s  programming  languages  became  popular,  complex  op- 
erating systems  emerged,  hardware  became  huge  and  sophisticated,  and  large  programs  consisting  of 
several  hundred  thousand  words  of  code  became  the  norm.  By  the  1970s  one  began  to  speak  of  automatic 
programming,  huge  virtual  memory  and  interconnecting  networks  of  computers.  Along  with  these  new  de- 
velopments one  begins  to  speak  of  programs  containing  millions  of  words  of  code  and  performing  huge  and 
complex  real-time  tasks. 

The  growth  in  computer  hardware  can  be  illustrated  in  several  ways.  First  of  all  the  total  number 
of  computer  installations  in  the  United  States  has  grown  phenomenally  (see  Table  1-1)  reaching  about 
50,  000  in  1970.  Sackman  quotes  an  estimate  that  the  European  Economic  Community  will  have  approxi- 
mately 10,  000  computers  by  1970.  Most  of  the  above  mentioned  computers  arc  large  general  purpose 
scientific  and  business  computers.  There  are  also  many  special  purpose  military  and  industrial  compu- 
ters. A detailed  count  is  difficult  to  obtain;  however,  in  Ref.  3,  an  estimate  of  the  current  U.  S.  Air  Force 
annual  expenditures  for  computer  hardware  is  quoted  between  ;^300  and  $400  million  per  year.  The  cor- 
responding estimate  of  the  cost  of  procuring  computer  software  is  $1 , 000  million  to  $1 , 500  million! ! 
Future  estimates  of  the  hardware  vs.  software  split  in  costs  is  shown  in  Figure  1-1.  Thus,  software 
costs  already  exceed  hardware  costs  by  a factor  of  3 or  4 in  the  U.  S.  Air  Force  and  are  rising  rapidly.** 


k 

A. 


Along  with  this  growth  has  come  a realization  that  the  largest  effort  in  developing  software  is  due 
to  software  integration,  test,  correction,  retest,  operational  release,  correction  and  rerelcase.  Actual 
writing  of  the  first  set  of  code  is  a small  task  in  comparison.  An  even  more  compelling  observation  is 
that  computers  are  increasingly  being  used  as  the  heart  of  complex  real-time  systems  such  as  air  traffic 
control,  vehicle  control,  space  systems,  and  military  systems.  System  reliability  is  the  most  important 

* This^ork  was  supported  by  the  Office  of  Naval  Research,  Statistics  and  Probability  Program  under 
contract  N0OI4-67-A-0438-O01  3. 

**Recent  estimates  on  the  cost  of  software  to  the  entire  U.  S.  economy  range  from  $10-$19  billion. 
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performance  measure  in  such  systems.  After  all,  if  a large  batch  computer  crashes  occasionally,  it 
probably  only  means  the  users  enjoy  one  rather  than  three  turnarounds  that  day.*  Similarly,  if  a time- 
sharing system  goes  down  infrequently  for  half  an  hour,  50  annoyed  users  have  an  unanticipated  coffee 
break,*  It  is  a far  different  story  if  the  air  traffic  control  system  handling  the  metropolitan  New  York 
area  crashes  on  a stormy  night  under  saturated  conditions,  and  requires  a several-minute  system  re- 
loading delay  before  targets  again  appear  on  the  video  display  terminalsl  In  any  of  these  situations,  it 
is  clear  that  a vital  measure  of  computer  system  performance  is  both  the  hardware  and  software  relia- 
bility. 

1.2  Hardware  vs.  Software  Errors 

I am  sure  the  reader  agre^^s  that  software  is  a costly  and  time  consuming  product,  but  why  must  we 
worry  about  software  errors?  Aren*t  all  or  the  majority  of  software  errors  removed  during  the  debugging 
and  test  phases?  Unfortunately  not.  Experienced  software  managers  are  fond  of  telling  "war  stories" 

(not  unlike  the  retired  general  who  tells  everyone  about  his  experiences  in  the  great  war)  about  difficult 
and  troublesome  bugs.  For  example.  Ref,  3 cites  "a  software  error  aboard  a French  meteorological 
satellite  caused  it  to  'emergency  destruct*  half  of  its  force  of  weather  balloons  instead  of  interrogating 
them.  **  In  another  case,  that  of  the  Apollo  spacecraft  guidance  computer^  the  similarity  of  program  names 
led  to  the  wrong  program  being  called  which  destroyed  the  guidance  systems  parameters,  necessitating 
a lengthy  reinitialization. 

The  most  dramatic  story  of  a software  error  comes  from  the  popular  American  science  fiction 
movie,  "2001  A Space  Odyssey,"^.  The  dialogue  occurs  between  Mission  Commander  David  Bowman 
and  the  computer  HAL  while  the  space  ship  controlled  by  HAL  is  on  its  way  to  the  planet  Jupiter,  HAL 
detects  and  Bowman  replaces  a faulty  component.  Bowman  orders,  "HAL,  Carry  out  fault  prediction 
tests,  " (on  the  removed  module)  "Circuit  fully  operational,  " reported  HAL  after  only  ten  seconds. 

But  Mission  Control  following  these  actions  has  its  own  interpretation.  ",  . . this  is  mission  control 
. . . there  is  another  possibility.  Your  computer  may  have  made  an  error  in  predicting  the  fault.  Both 
our  own  HAL  computers  agree  in  suggesting  this.  . . ■ 

Bowman  drummed  his  fingers  on  the  console  , . . "HAL.  , .is  something  bothering  you  - something 
which  might  account  for  this  problem?" 

"Look,  Dave,  I know  you  are  trying  to  be  helpful.  But  my  information  processing  is  normal. , , 
fi  check  my  record;  you  will  find  it  completely  free  from  error,  " 

"I  know  about  your  service  record,  but. . . anyone  can  make  mistakes.  " 

"I  don't  want  to  insist,  Dave,  but  I am  incapable  of  making  an  error,  . 

■Hello,  this  is  Mission  Control.  We  have  completed  the  analysis  of  your  AE-35  difficulty,  and  both 
our  HAL  computers  are  in  agreement.  The  trouble  lies  in  the  prediction  circuits,  and  we  believe  that  it 
indicates  a programming  conflict  which  we  can  only  resolve  if  you  disconnect  your  HAL  computer  and 
switch  to  Earth  mode  control.  " 

As  Bowman  began  to  <<witch  off  the  computer,  HAL  fought  back,  and  Bowman  only  gained  control  of 
his  spaceship  by  dismantling  the  computer's  memory. 

Our  software  problems  haven't  gone  this  far  yet,  or  have  they? 

Although  the  above  three  examples  all  dealt  with  space  missions,  mainly  because  they  make  the  point 
so  vividly,  many  similar  examples  of  the  problems  caused  by  software  failures  in  military  and  industrial 
computer  systems  can  be  cited. 

1.  3 Some  Computer  Failure  Data 

We  may  shed  further  light  on  the  distribution  between  hardware  and  software  errors  by  considering 
some  actual  data.  The  data  given  in  Table  1-2  lists  hardware  and  software  failure  rates  for  a typical 
real-time  data  acquisition  system.  The  data  represents  9 months  of  operation  totaling  1701  hours.  In- 
spection of  the  data  shows  that  48%  of  the  failures  were  due  to  software.  This  is  a startling  figure  if  one 
realizes  that  although  most  computer  projects  estimate  and  predict  the  hardware  reliability,  and  use  re- 
dundancy techniques  and  high  reliability  parts  to  improve  the  hardware  reliability,  the  software  is  left  to 
the  skill  and  hard  work  of  the  programming  team  with  no  quantitative  assessment  of  design  progress.  We 
may  obtain  a typical  estimate  of  what  type  of  system  reliability  can  be  obtained  in  practice.  Assuming  a 
simple  failure  model,  we  may  compute  the  mean-time  to  failure,  MTTF  as  the  reciprocal  of  the  failure 
rate.  Thus,  the  MTTF  « 30  hours  for  the  data  in  Table  1-2.  In  other  words  about  one  failure  per  day 
will  occur  if  the  equipment  is  used  24  hours/day  and  one  failure  every  three  days  if  used  8 hours/day. 
(Additional  data  is  given  in  Ref.  37.) 

1 . 4 Computer  Engineering 

Historically,  hardware  reliability  has  always  been  a discipline  which  logically  fell  within  the  scope 
of  system  engineering.  Similarly,  software  engineering  must  logically  include  software  reliability.  One 
huge  difference  exists  in  this  analogy.  In  the  late  1940s  and  early  1950s  when  hardware  reliability  was 
born  as  a discipline,  hardware  systems  engineering  was  already  a well  established  and  exercised  field. 
Unfortunately,  software  production  is  still  largely  an  art  so  that  evolution  of  the  field  of  software  relia- 
bility must  occur  in  parallel  with  the  development  of  software  engineering. 

ough  less  dramatic,  such  errors  if  too  frequent  can  have  severe  economic  an?  operational 
effects  on  batch  and  time-shared  computation. 
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In  this  section  we  try  to  briefly  discuss  some  of  the  scant  knowledge  of  software  engineering  which 
directly  relates  to  software  reliability.  One  facet  iriherent  in  any  engineering  design  is  schedule  and 
manpower  needs.  In  Fig.  1-2  we  see  estimates  of  the  speed  at  which  code  is  produced  measured  in  lines 
of  machine  code  per  man  hour.  If  we  approximate  the  median  between  the  10th  and  90th  percentile  in 
Fig.  1-2,  we  obtain:  in  1955  machine  language  programming  productivity  was  200  machine  instructions/ 
man  month;  in  1970  programming  in  a higher  level  language  has  raised  this  figure  to  about  700  machine 
instructions/man  month;  the  projection  for  1985  using  structured  programming  techniques  (top  down,  go 
to  free,  cf.  Ref.  23)  is  1000  machine  instructions  per  man  month.  At  the  outset  we  might  say  that  the 
switch  from  machine  language  to  higher  level  language  programming  (FORTRAN,  etc.)  was  of  great  help 
in  that  it  increased  productivity  by  a factor  of  3.  5.  A deeper  consideration  of  these  facts  makes  this  pro- 
gress appear  less  spectacular.  In  the  compilation  process  each  line  of  code  written  in  a higher  level 
language  expands  into  between  5 and  10  lines  of  machine  code.  Thus,  we  might  say  that  in  going  from  ma- 
chine to  higher  level  language  we  have  decreased  productivity  by  a factor  of  two  due  to  the  time  to  write 
an  instruction,  yet  increased  productivity  by  a factoi  of  about  seven  due  to  the  expansion  of  code  by  the 
compiler.  This  is  only  part  of  the  story  since  this  only  measures  code  productivity.  From  Table  1-3  we 
see  that  this  phase  only  represents  from  15%  to  25%  of  the  total  effort. 

If  we  use  the  SAGE  data  in  Table  1-3,  coding  represents  15%  of  the  effort.  Also  using  the  previ- 
ously quoted  figure  that  a programming  project  done  in  a higher  level  language  proceeoo  at  a rate  of  about 
700  instructions  per  man  month  or  8400  machine  instructions  per  man  year,  yields  the  following  conclu- 
sions; A 12,  000  word  FORTRAN  program  would  yield  about  a 84,  000  word  machine  language  program 
and  would  require  about  10  man  years  to  code  the  program.  In  addition,  about  30  man  years  would  be  re- 
quired for  checkout  and  test  (debugging).  Also,  about  27  man  years  would  be  required  for  analysis  and 
design.  One  must  take  care  in  working  with  these  gross  rules  of  thumb  to  remember  that  time  can  only 
be  traded  for  manpower  within  reason.  The  favorite  illustration  is:  it  takes  one  woman  9 months  to  pro- 
duce a baby  but  9 women  can't  produce  a baby  in  one  month,  (see  ref.  39). 

Suppose  two  years*  time  were  allotted  for  the  above  example.  A reasonable  preliminary  schedule 
might  be  to  start  with  a design  group  of  27  working  during  the  first  year.  After  6 months,  the  analysis 
group  should  have  enough  work  done  so  that  coding  can  begin,  so  a group  of  10  programmers  join  the 
project  after  6 months  and  should  be  completed  with  work  at  the  18  months  mark.  After  about  9 months 
a group  of  10  programmers  could  form  a test  team  to  start  testing  and  debugging  their  first  sections  of 
code  which  have  been  produced.  After  about  1 year,  another  group  of  17  programmers  could  join  and 
form  one  or  more  additional  test  teams  and  begin  debugging  on  the  new  sections  of  code  being  written. 
Notice  that  at  the  peak  manpower  level  about  64  people  are  working  on  the  projectl  I If  we  assume  that 
with  overhead  the  salary  of  an  analyst  is  )§40,  000  and  that  of  a programmer  is  )^30,  000,  the  overall  labor 
cost  of  such  a piece  of  software  is  over  ^2,  000,  000.  (A  comprehensive  treatment  of  software  cost  esti- 
matiiig  appears  in  Ref.  38  and  43.) 

We  now  leave  the  subject  of  schedules  and  turn  to  some  of  the  other  measures  of  system  performance 
which  are  used  to  evaluate  a program:  core  size  (number  of  machine  language  instructions),  running  time 
of  the  program,  load  factors,  ease  of  change,  and  portability.  When  we  say  core  size,  we  mean  memory 
in  general  and  with  the  advent  of  low  cost  electronic  me-nory  and  inexpensive  disk  storage  units,  this  is  a 
less  significant  constraint  than  it  once  was.  However,  in  space  and  aircraft  applications  where  memory 
size  is  fixed  by  power,  weight,  and  volume  considerations,  investigations  have  found  ^ that  if  the  pro- 
grammers have  a fixed  core  size  which  is  too  small  for  their  initial  design,  then  the  tricks  they  use  to 
"squeeze”  the  program  into  the  core  they  have  leads  to  great  problems  with  errors  and  debugging.  (See 
Fig.  1-3.)  The  influence  of  program  run  time  on  reliability  has  received  little  attention.  It  seems  only 
obvious  that  if  there  is  a very  tight  maximum  on  program  run  time  (sometimes  correctly  and  sometimes 
erroneously  specified)  that  additional  program  tricks  will  be  required  which  again  contribute  to  the  error 
and  debugging  problems.  The  implication  is  that  not  only  will  the  debugging  costs  increase  but  that  more 
sophisticated  debug  resistant  errors  may  remain  in  the  released  version  of  the  software  to  plague  one 
during  operation.  Many  people  have  observed  that  the  effect  of  increased  load  on  a system  is  to  cause 
more  frequent  software  errors.  Most  oeople  using  time-sharing  systems  feel  that  the  mean-time  between 
system  crashes  is  strongly  dependent  on  the  number  of  users.  (An  interesting  and  definitive  description 
of  the  software  development  process  appears  in  Ref.  39.) 

2.  0 DEFINITION  OF  A SOFTWARE  BUG 

2.  I Problems  in  Definition 


Definitions  are  always  very  difficult  in  the  field  of  reliability,  since  we  are  trying  to  model  a dif- 
fuse and  complex  physical  situation  by  a mathematical  model.  The  concepts  of  how  to  count  multiple, 
repeated,  and  transient  errors  in  hardware  is  most  difficult.  The  author  knows  of  cases  where  in  a mili- 
tary contract,  the  contractor  had  to  perform  a reliability  test  of  h hours  on  the  equipment  in  question.  If 
X or  fewer  failures  occurred  during  the  duration  of  the  test,  then  the  equipment  passed.  A board  was 
created  composed  of  government  and  contractor  engineers.  This  failure  board  considered  the  te«t  results 
and  voted  on  each  occurrence  of  abnormal  behavior  noted  in  the  log  book  kept  during  the  test  to  determine 
whether  the  occurrence  should  count  as  a failure.  The  most  difficult  case  to  decide  on  was  where  a cer- 
tain logic  card  type  failed  m times.  If  this  could  be  shown  to  be  a simple  repetuion  of  the  same  transient 
failure,  it  would  only  count  once,  whereas  if  it  was  m separate  failures,  it  would  count  m times.  Often 
voting  on  the  board  was  along  strict  "party  lines,  " but  often  there  were  honest  diflerences  of  engineering 
opinion. 


If  the  definition  of  failure  is  difficult  in  the  case  of  hardware  where  wo  have  more  experience  and 
theoretical  guidelines,  it  is  even  a more  exacting  task  in  the  case  of  less  well  understood  softu-are.  The 
author  hopes  the  definitions  proposed  in  the  remainder  of  this  section  are  a step  toward  a set  of  working 
definitions.  At  least  they  raise  the  salient  issues  in  the  reader's  mind  if  he  wishes  to  formulate  his 
definitions. 


7-4 


2.  2 Definition  of  a Bug 


The  following  definition  of  a software  bug  is  proposed: 

’’One  or  more  software  bugs  exist  in  a system  if  a software  change  is  required  to  correct  a single  major 
error  or  minor  error  so  as  to  meet  specified  or  implied  system  performance  requirements.  " 

The  following  definitions  are  imbedded  within  the  above  definition: 

Change  - Any  alteration  (addition,  deletion,  correction)  of  the  program  code  whether  it  be  a single  char- 
acter or  thousands  of  lines  of  code.  Changes  made  to  improve  documentation  or  satisfy  nev.  sp'  cifications 
are  important  to  record  and  study  but  are  not  counted  as  bugs. 

Major  error  - A catastrophic  event  which  interrupts  or  could  interrupt  most  or  all  major  system  functions, 
e.  g.  as  an  infinite  loop,  system  crash,  a major  memory  overflow  or  data  base  corruption,  etc. 

Minor  error  - A marginal  event  which  allows  or  could  allow  some  portions  of  the  system  to  operate  prop- 
erly while  interrupting  others,  e.  g.  some  missing  output,  some  wrong  output,  an  inaccurate  compulation, 
a recoverable  transient  error,  etc. 

Specified  performance  requirements  - A written  requirement,  figure  of  merit,  or  parameter  which  quali- 
tatively or  quantitatively  defines  system  perfoimance. 

Implied  system  performance  - An  unwritten  requirement  which  is  understood  by  the  majority  of  the  proj- 
ect team  to  be  essentially  equivalent  to  a written  requirement. 

Inherent  in  the  above  definitions  and  discussion  is  the  assumption  that  errors  can  be  and  are  de- 
tected and  recorded.  The  detection  of  errors  can  be  effected  by  monitoring  the  system  (or  simulated  sys- 
tem) performance  or  by  reading  the  code.  Furthermore,  it  is  assumed  that  each  error  is  sufficiently 
well  investigated  so  that  it  can  be  classified  as  hardware,  software,  operator,  or  unknown,  and  that  the 
unknown  category  is  small,  say  less  than  20%. 

2.  3 Multiple  Bugs  and  Changes 

In  this  section  we  amplify  on  the  previous  definition  in  order  to  classify  the  cases  of  repetitious 
errors,  multiple  errors  corrected  by  one  change,  and  multiple  changes  to  correct  one  error.  We  now 
introduce  the  concept  of  internal  and  external  errors.  An  external  error  is  a performance  error  of  the 
system  which  is  generally  detected  by  executing  the  code.  Theoretically,  an  external  error  could  also  be 
found  by  reading  the  code  C’eyeballing").  An  internal  error  is  a coding  error  which  is  always  found  by 
reading  the  code  either  by  man  or  machine. 

The  programmer  could  have  initiated  code  reading  due  to  one  or  more  of  the  following  factors; 

(1)  an  external  error  has  been  detected,  and  he  is  trying  to  find  the  corresponding  internal  error. 

(2)  he  was  reading  the  code  to  verify  a program  change  before  submitting  it  to  the  computer. 

(3)  he  was  performing  a code  reading  test  to  detect  errors. 

(4)  a colleague  told  him  of  an  actual  or  potential  error. 

We  may  think  of  internal  errors  as  causes  and  external  errors  as  effects.  Thus,  if  a single  internal  er- 
ror results  in  an  associated  single  external  error,  we  call  it  a single  bug.  If  an  internal  error  results  in 
a minor  or  no  detectable  exteriai  error,  then  no  bug  exists.  If  an  external  error  exists,  and  we  are  sure 
it  is  a software  problem,  then  a bug  exists  regardless  of  whether  or  not  we  can  find  the  corresponding 
internal  error. 

One  important  class  of  external  errors  for  which  no  internal  error  can  be  found  are  transient  er- 
rors. These  exist  for  too  short  a time  for  isolation  of  the  cause.  A transient  error  which  is  found  is  no 
longer  transient.  If  before  the  internal  error  is  found,  the  transient  occurs  m times,  it  is  still  only 
counted  as  a single  error  if  the  symptoms  are  the  same. 

Theoretically,  many  internal  errors  can  combine  to  cooperatively  cause  one  external  error.  We 
expect  that  this  is  an  event  with  a low  probability  of  occurrence.  Since  only  a single  external  error  exists, 
this  would  be  classified  as  one  bug. 

A more  common  multiple  error  case  would  be  where  one  internal  error  causes  m external  errors. 
Initially,  if  the  m external  errors  are  known,  but  no  corresponding  internal  error  has  been  found,  then  we 
classify  this  result  as  m bugs.  At  a later  time,  several  days  or  weeks  hence,  when  the  common  internal 
bug  is  found,  we  decide  to  reclassify  this  result  as  a single  bug.  Thus,  if  we  are  recording  the  cumula- 
tive number  of  bugs  frequently,  say  each  day,  then  the  above  event  would  first  count  as  an  increase  of  m 
bugs,  and  later  when  the  single  common  internal  error  was  found,  there  would  be  a decrease  of  m-1  bugs. 
Of  course,  if  the  data  were  taken  less  frequently,  the  diagnosis  of  the  internal  error  and  the  external  er- 
ror would  fall  within  the  same  time  interval  and  only  the  net  result,  i.  e.  one  bug,  would  be  recorded.  In 
any  event  we  could  probably  treat  the  number  of  bugs  as  defined  herein  as  an  upper  bound  if  multiple  ex- 
ternal errors  occur  frequently. 
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2.  4 Old  or  New  Bugs 

It  is  useful  in  trying  to  model  the  dynamics  of  the  debugging  process  to  know  whether  a bug  is  old 
or  new.  To  be  more  precise  we  might  use  the  terminology  previously  corrected  bugs  and  generated  bugs. 

A previously  corrected  bug  is  one  which  reoccurs  in  substantially  the  same  form  after  the  pro- 
grammer terminated  his  work  on  a code  change  believing  that  the  error  was  corrected.  A Cv^nclusive  de- 
cision that  a bug  was  a previously  corrected  one  can  only  be  made  based  on  an  internal  error  diagnosis. 

A generated  error  is  one  which  did  not  exist  until  it  was  created  as  a by-product  of  a code  change 
made  to  correct  some  bug.  A generated  error  is  usually  best  diagnosed  by  finding  an  internal  error. 
However,  it  is  sometimes  possible  to  base  such  a classification  on  an  external  error,  i.  e.  if  a newly 
created  variable  appears  in  the  wrong  output  form.  Of  course,  all  the  above  definitions  rely  on  subjective 
judgment  of  the  programmers.  However,  it  is  hoped  that  qualified  personnel  could  use  these  definitions 
to  obtain  repeatable  results. 

3.  0 DEFINITION  OF  SOFTWARE  RELIABILITY 
3.  I Factors  in  the  Definitions 


In  the  early  days  of  hardware  reliability  there  was  much  major  soul  searching  and  thrashing  about 
until  a widely  accepted  definition  of  reliability  was  formulated.  In  the  preceding  section  some  working 
definitions  of  software  bugs  were  developed.  In  this  section  we  will  attempt  to  develop  a working  definition 
of  software  reliability. 

Our  experience  in  the  hardware  area  has  taught  us  that  reliability  must  be  defined  as  the  proba- 
bility that  some  event  occurs  over  a period  of  time  {operating  time).  In  our  case  the  event  is  success  of 
the  software,  i.  e.  error  free  software  operation.  We  are  now  helped  by  the  definitions  of  the  previous 
section  in  defining  the  event  error  free  software  operation.  Error  free  software  operation  over  the  oper- 
ating time  interval  0 to  t means  that  no  software  bug  (external  software  error)  occurred  over  that  inter- 
val. Clearly,  if  we  are  to  define  an  external  error,  we  must  define  what  constitutes  successful  perfor- 
mance. 


Another  factor  which  must  be  specified  is  the  hardware  environment.  This  suggests  certain  obvi- 
ous factors  as  well  as  more  subtle  ones.  Obviously,  a FOR  TRAN  program  written  for  use  on  an  IBM  com- 
puter will  probably  need  modification  before  it  can  be  ustd  on  a UNIVAC  computer.  We  find  more  subtle 
differences  when  a computer  is  being  specially  designed  for  a project  and  the  software  and  hardware  are 
to  be  mated  on  the  laboratory  prototype.  Often  there  are  last  minute  changes  which  make  for  significant 
differences  between  the  prototype  and  the  field  installation.  Also,  as  an  economy  measure,  the  prototype 
may  be  a smaller  configuration  than  the  actual  field  installation.  We  are  now  faced  with  the  situation  where 
the  final  tests  on  the  hardware  complex  will  be  carried  out  on  a computer  which  differs  from  the  opera- 
tional model. 

Finally,  we  must  be  concerned  about  the  intended  task  the  system  must  perform.  Although  there 
are  always  extensive  efforts  to  write  comprehensive  specifications,  problems  occur.  For  example,  re- 
turning to  the  previously  cited  APOLLO  program,  it  was  decided  that  the  astronauts  would  never  enter  a 
call  for  the  wrong  program  during  the  orbit  phase  of  flight  so  no  error  checking  for  program  calls  was  in- 
corporated in  the  software.  Subsequently,  an  astronaut  while  in  orbit  called  for  the  ground  initialization 
and  alignment  program.  Would  you  call  this  a software  error? 

In  many  cases  a system  is  originally  sized  for  a particular  data  input  rate.  As  the  system  begins 
to  function  successfully  in  the  Leld  after  elimination  of  the  initial  bugs,  there  is  a trend  to  employ  it  more 
widely.  The  error  rate  of  the  system  increases  as  the  data  input  rate  increases.  A classic  example  of 
this  effect  is  the  initial  deployment  of  the  SABER  Airlines  Reservation  System.  Every  time  a new  group 
of  terminals  was  added  to  the  system  from  a new  city  or  group  of  cities  a new  crop  of  bugs  was  encountered. 
The  system  was  allowed  to  grow,  in  stages,  at  a controlled  rate  by  establishing  an  upper  failure  rate 
limit  and  a lower  failure  rate  limit.  When  debugging  removed  enough  errors  so  that  the  system  reached 
the  lower  failure  rate  limits,  new  terminals  were  added  until  the  upper  limit  was  reached.  As  soon  as  the 
debugging  team  reduced  the  failure  rate  to  the  lower  limit,  the  cycle  was  repeated. 

3.  2 Definition 


A definition  for  software  reliability  is  given  below  in  keeping  with  the  factors  discussed  above. 

This  definition  is  a slight  modification  of  one  given  by  Hesse®: 

'^Software  reliability  is  defined  as  the  probability  that  a given  software  program  operates  for  some 
time  period,  without  an  external  software  error,  on  the  machine  for  which  it  was  designed  given  that  it  is 
used  within  design  limits.  ” 


Once  we  have  related  reliability  to  a probability,  as  in  the  above  definition,  the  mathematical 
basis  of  the  measure  is  well  founded.  Of  course,  the  problems  in  interpreting  terms  such  as  external 
error  and  design  limits  still  exists. 


3.  3 Reliability  Theory 


The  following  brief  development  of  the  reliability  function, 
failure  is  included  for  those  unfamiliar  with  reliability  theory^, 
functions 


hazard  function,  and  mean  time  to 
We  begin  with  the  standard  probability 


r 
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R(t)  = P(t  > t) 

(3-1) 

F(t)  = 1 - R(t) 

(3-2) 

f(t) 

dR(t) 

dt 

(3-3) 

where 


t 

t 

R(t) 

F(t) 

P(t  > t) 


the  random  variable  time  to  failure, 
a particular  value  of  the  random  variable. 

the  reliability  function,  which  yields  the  probability  of  no  failure  in  the  interval  0 to  t. 

the  cumulative  distribution  function,  which  yields  the  probability  of  failure  in  the 
interval  0 to  t. 

the  probability  that  the  time  to  failure  lies  outside  the  interval  0 to  t. 


Workers  in  the  field  of  reliability  have  found  it  convenient  to  define  a different  conditional  proba- 
bility function  called  the  failure  rate  or  hazard  function,  z(t).  One  may  define  z(t)  in  a manner  analogous 
to  the  definition  of  f(t)  in  terms  of  the  probability  that  a failure  occurs  in  the  interval  t to  t + 


Probability  of  a failure  * P{t  < £ t+At)  = f(t)At 
in  the  interval  t to  t+At. 


did  not  occur  prior 


i failure  i 
t to  t+At  j 
hat  failure! 
rior  to  t.  » 


Probability  of  a failure  j=  P(t  < t < t+At|^  > t) 
in  the  interval 
given  the  fact  that 


From  these  definitions  it  can  be  shown  that 


Solving  this  differential  equation  for  R(t)  subject  to  the  initial  condition  that  the  item  is  initially 
good,  i.  e,  , R(t=0)  = 1 yields  ^ 

- J z(x)dx 

R(t)  = e ° . (3-7) 

Another  measure  which  is  often  used  is  the  mean  time  to  system  failure,  MTTF.  This  is 
simply  given  as  the  first  moment  of  the  random  variable  t. 


= z(t)At 


1 dR(t) 


(3-4) 


(3-5) 


MTTF 


= /tf(t)dt 


It  can  be  shown  that  Eq.  3-8  can  be  reduced  to  the  simpler  computational  form 


MTTF  = / R(t)dt 


For  the  simple  case  where  z(t)  is  a constant  Eqs.  (3-7)  and  (3-8)  yield 


(3-8) 


(3-9) 


z(t)  = X 

(3-10) 

R(t)  = 

(3-11) 

MTTF  = l/x 

(3-12) 

4.0  ERROR  DATA  AND  MODELS 


4.  I Introduction 


When  one  attempts  to  apply  probability  and  statistics  to  an  engineering  problem,  two  ap- 
proaches immediately  suggest  themselves.  The  pure  statistical  approach  is  to  define  the  variables 
and  performance  nieasure(s)  and  construct  and  carry  out  an  experiment.  The  experimental  results 
are  statistically  analyzed  to  determine  quantitatively  what  relationships  (if  any)  exist  among  the  per- 
formance measure(s)  and  the  variables.  The  other  approach  is  to  formulate  a probabilistic  hypothesis 
about  how  the  variables  interact  and  write  a corresponding  equation  relating  the  performance  measure(s) 
to  the  variables.  Based  on  these  models,  experiments  are  planned  to  verify  the  hypotheses  and  to  de- 
termine the  constants  in  the  equations.  The  reader  may  wish  to  view  these  approaches  as  analogous  to 
the  distinct  approaches  of  a theoretical  and  an  experimental  physicist.  In  either  case  it  is  appropriate  to 
begin  by  examining  the  data  available  in  the  literature  at  the  outset. 


i 
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The  first  relevant  question  is  what  data  should  we  examine.  If  we  ask  experienced  software  manag- 
ers what  quantitative  measures  they  use  to  gauge  the  progress  of  a software  program,  they  answer,  none 
or  refer  to  graphs  of  the  cumulative  number  of  errors  removed  from  the  software.  Those  who  believe 
this  is  a significant  measure  look  for  the  slope  of  the  curve  to  approach  zero  before  deciding  the  software 
is  sufficiently  debugged  for  release.  This  section  of  the  paper  discusses  some  of  the  sparse  experimental 
data  available  in  the  open  literature  on  the  number  of  errors  removed  from  a computer  program.  The  next 
section  builds  upon  this  data  and  certain  hypotheses  to  evolve  a probabilistic  error  model. 

Also  of  interest  are  the  error  types  and  frequencies  of  occurrence.  Unfortunately,  only  a small 
amount  of  data  has  been  accumulated  in  this  area,  and  until  the  gross  models  (such  as  those  discussed 
above  which  treat  all  errors  alike)  have  been  verified,  such  refinement  in  a model  is  perhaps  unjustified. 

4.  2 Post  Release  Data 

As  is  the  case  with  all  studies,  data  is  difficult  and  generally  costly  to  obtain.  Good  and  complete 
records  are  not  kept  in  most  situations.  Record  keeping  is  generally  better  in  the  case  of  large  military 
and  space  programs  and  after  a release  of  a large  commercial  operating  system.  Although  a fairly  large 
amount  of  such  data  exists,  military  secrecy  and  industrial  proprietary  policies  inhibit  its  publication  in 
many  cases.  Some  of  this  data  which  has  been  published  appears  in  Refs.  8 and  10. 

Assume  that  a typical  operating  system  for  a large  computer  is  undergoing  continual  development 
and  that  new  features  and  capabilities  are  being  added.  The  manufacturer's  development  group  deals  with 
a continually  changing  product,  but  external  versions  (generally  called  releases)  are  only  made  available 
periodically,  say  every  6 months.  Although  the  manufacturer  tries  to  thoroughly  test  each  release,  the 
exercising  of  the  program  by  a fair  proportion  of  the  large  and  diverse  user  community  is  more  compre- 
hensive than  any  test  he  can  devise.  Consequently,  soon  after  release  of  a new  version,  the  number  of 
errors  found  per  month  (error  rate)  rises  rapidly  to  a peak.  As  these  are  diagnosed  and  corrected,  the 
number  of  residual  errors  decreases  and  the  error  rate  begins  to  decrease.  When  a new  release  is  dis- 
tributed, this  behavior  is  repeated.  Such  typical  behavior  is  sketched  in  Fig.  4-1.  Note  that  the  vertical 
axis  is  normalized  by  dividing  by  the  total  number  of  machine  language  instructions.  This  should  allow 
us  to  compare  both  large  and  small  programs  to  see  if  there  is  a behavior  pattern  independent  of  size. 
Detailed  data  on  the  normalized  number  of  errors  since  release  for  three  different  supervisory  systems 
(operating  systems)  is  given  in  Fig.  4-2.  Note  that  the  horizontal  axis  units  are  months  of  debugging  t. 

In  this  case  t is  identical  with  operating  time,  t;  however,  this  is  not  always  the  case. 

Note  that  the  shapes  depicted  in  Fig.  4-2a,  b,  c vary.  If  we  assume  that  the  number  of  remaining 
errors  decreases  monotonically  and  that  the  error  discovery  rate  is  proportional  to  number  of  remaining 
errors,  exponential  decay  is  obtained.  This  explains  in  a gross  way  the  ■tail"  of  the  curves.  The  initial 
behavior  may  be  due  to  the  fact  that  initially  only  a few  installations  are  using  the  new  release,  and  it  is 
not  until  a few  months  later  that  a sizeable  proportion  of  users  have  instituted  this  software.  Thus,  it 
might  be  more  appropriate  to  let  t represent  a more  general  resource  variable  such  as  user-months,  or 
to  serve  as  a more  realistic  horizontal  scale  parameter.  (See  Ref.  10  for  a more  detailed  discussion  of 
these  curve  shapes.) 

In  Fig.  4-3  the  error  rate  curves  for  four  applications  programs  are  presented.  In  this  case  the 
origin  t=  0 represents  the  start  of  program  integration  where  all  the  individual  modules  of  code  are  put 
together  to  form  a system.  As  is  well  known,  at  this  point  incompatibilities  between  the  modules  crop  up 
and  a new  set  of  interface  errors  must  be  debugged.  We  may  employ  the  same  argument  used  previously 
to  describe  the  tails  of  the  curves  in  Fig.  4-3.  Also,  if  we  think  of  'i'  as  a general  resource  variable  which 
is  a function  of  man-hours  of  debugging  and  computer  test  hours,  this  may  explain  the  initial  behavior. 
Again,  no  data  is  available  to  test  this  hypothesis. 

4.  3 Error  Model 

Referring  to  the  data  discussed  in  the  previous  section  we  see  that  although  the  curve  shapes  differ, 
the  vertical  and  horizontal  scales  are  similar.  Based  on  this  result  we  can  proceed  to  formulate  a general 
error  model  using  the  number  of  machine  language  instructions  as  a normalizing  factor.  * 

Basically,  the  error  model  used  in  this  paper  assumes  that  the  total  number  of  errors  in  the  pro- 
gram is  fixed  and  that  if  we  record  the  cumulative  number  of  errors  corrected  during  debugging,  then  the 
difference  represents  the  remaining  errors.  The  following  section  on  reliability  models  will  relate  the 
probability  of  encountering  a software  bug  to  the  number  of  residual  bugs. 

The  normalized  error  rate  is  defined  as* 

o(*^)  = errors/total  number  of  instructions/ 

month  of  debugging  time.  (4-1) 

Thus,  Figs.  4-1,  4-2,  and  4- 3 are  plots  of  vs.  r. 

Since  we  are  interested  in  the  total  number  of  erro.  s removed,  we  will  define  a cumulative  error 
curve,  €(t)  , which  is  the  area  under  the  p(r)  curve: 

T 

€(t)  = f o(x)dx  = cumulative  errors/total 

o number  of  instructions  (4-2) 


♦ Recent ^ork  (see  Ref.  40,  41 ) suggests  that  instead  of  the  total  number  of  instructions,  a better  measure 
of  program  length  is  the  total  number  of  operators  and  operands  in  the  program. 
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and  p(t)  is  of  course  the  slope  of  the  £(t)  curve: 

p(t)  = de(T)/ dr  (4-3) 

A curve  of  the  cumulative  error  data  for  the  supervisory  system  A of  Fig.  4-2  is  shown  in 
Fig.  4-4.  If  similar  curves  for  e(r)  were  drawn  for  the  other  examples  of  Figs.  4-2  and  4-3,  all  would 
start  at  zero,  increase  slowly,  then  move  rapidly,  and  finally,  more  slowly  approaching  a slowly  in- 
creasing or  zero  rate.  Because  of  this  similarity  in  behavior,  a cumulative  curve  such  as  6(t)  is  not  too 
useful  to  depict  differences  in  behaviors;  thus,  the  derivative  curve,  p(t),  is  more  useful  for  this  purpose. 
Both  curves  are  needed  for  a detailed  study. 

If  we  assume  that  the  total  number  of  errors  in  the  program  E-p  is  constant  and  that  the  program 
contains  Ip  total  instructions  and  that  no  new  errors  are  added  during  debugging,*  then  the  asymptote 
which  the  e(x)  curves  approach  is  Ep/lx-  If  wfe  assume  that  all  detected  errors  are  corrected  errors, 
then  by  inspection  of  Fig.  4-4,  we  can  write  an  expression  for  the  number  of  residual  errors: 

C^(t)  = (E.j,/I.j,)  - e^(T)  (4-4) 

We  assume  that  in  any  sizeable  program  it  is  impossible  to  remove  all  errors,  so 


e^(r)  < E.j,/I.j,  (4-5) 

e^(T)  > 0 . (4-6) 

Also  since  we  assume  that  most  programs  eventually  reach  a reasonable  debugged  state,  we  may  assume 
that  for  large  t,  e^(r)  is  small. 

In  order  to  test  the  hypothesis  that  the  normalized  behaviors  of  (t)  and  o(t)  hold  for  a wide 
variety  of  program  sizes  we  make  the  following  comparisons  with  the  dafa  in  Figs.  4-2  and  4-3:  (1) 

In  order  to  test  the  hypothesis  that  the  normalized  number  of  errors  E^/l^  is  somewhat  constant  for 
a variety  of  programs,  we  compute  the  ratio  and  compare  the  results.  ^(2)  An  allied  hypothesis  is  that 
debugging  proceeds  at  a roughly  similar  average  rate  po  over  an  entire  project.  The  results  are  given 
in  Table  4-1.  The  value  of  Ep/lp  varies  about  the  average  by  +48%  and  -31%  and  that  of  p©  varies  about 
the  average  by  +75%  and  -31%.  Note  that  all  these  programs  are  about  1/4  million  machine  language 
statements  in  size.  The  data  is  often  *dirty"  since  in  some  projects  only  program  corrections  are  counted; 
whereas,  in  others  specification  and  improvement  changes  are  lumped  in  with  actual  error  changes. 

Furthermore,  the  applications  programs  presented  debugging  information  during  the  program  inte- 
gration phase  of  software  development;  whereas,  the  supervisory  programs  reported  errors  after  release. 
It  is  not  unreasonable  that  the  errors  found  during  system  integration  and  after  release  of  a large  software 
package  are  roughly  commensurable.  Based  on  three  software  programs,  it  has  been  shown  that  the  ratio 
of  changes  after  release  to  changes  during  integration  and  test  was  about  0.  8.  If  we  compare  the  aver- 
age value  of  E-p/lx  for  the  supervisory  programs  to  that  for  the  application  programs  in  Table  4-  I , the 
ratio  is  about  0.  7.  Based  on  the  above  factors  the  data  appears  to  verify  the  hypothesis  that  Ex/^T  P 
are  approximately  constant  for  similar  size  programs.  We  now  present  similar  data  for  small  programs” 
in  Table  4-2,  In  this  case  both  the  values  of  Ex/It  0©  about  the  average  by  +79%  and  -36%. 

The  data  in  Table  4-2  includes  data  taken  during  module  test  as  well  as  during  integration  testing 
and  as  might  be  expected  (because  of  the  two  phases  being  lumped),  the  average  value  of  Ex/It  f°*“ 
small  programs  data  is  2.  15  times  larger  than  the  large  program  data  whereas  the  value  of  Oq 
times  larger.  Drawing  these  various  facts  together  allows  us  to  state  that  within  a factor  of  perhaps  2, 
the  values  of  Ex/lx  Po  constant  for  a wide  variety  of  programs.  Furthermore,  within  a 

similar  factor,  the  number  of  bugs  per  machine  instruction  found  during  module  test,  integration  testing, 
and  after  release  are  roughly  the  same. 

One  further  comrnent  is  in  order  before  we  leave  the  subject  of  error  models.  Some  experienced 
programmers  have  challenged  the  assumption  that  no  new  errors  are  generated  during  debugging.  In 
Fig.  4-5  three  dynamic  debugging  behaviors  are  illustrated.  In  Fig.4-5a  no  new  errors  are  added,  and 
the  situation  depicted  is  just  the  one  which  we  have  been  discussing.  In  Fig.  4-5b  errors  are  added;  how- 
ever, the  removal  rate  exceeds  the  generation  rate  and  equilibrium  is  obtained.  If  the  number  of  errors 
added  is  small  percentage  wise,  even  cases  (a)  and  (b)  are  approximately  numerically  equivalent. 

Fig.4-5(c)  depicts  a case  where  the  error  generation  rate  exceeds  the  error  removal  rate  and  the  process 
diverges.  A newly  devised  model,  formulated  by  this  author  and  his  co-workers  describes  error  genera- 
tion in  cases  (b)  and  (c),  and  is  discussed  and  developed  in  Reference  34. 

5.0  RELIABILITY  MODELS 

5.  I Introduction 

In  order  to  formulate  a reliability  model,  one  can  take  a microscopic  or  a macroscopic  approach. 

In  the  microscopic  approach  we  would  try  and  identify  individual  bugs  (either  deterministically  or  proba- 
bilistically), the  t>x>e  of  bug,  the  path  in  the  program,  and  how  frequently  the  path  is  traversed.  Initial 
attempts  along  these  lines  have  convinced  this  author  that  such  an  approach,  while  necessary  in  the  long 
run  involves  a more  detailed  knowledge  of  program  structure  and  bug  types  than  is  now  available.  The 
macroscopic  approach  where  all  bugs  are  lumped  and  treated  equally  will  be  employed  here.  The  validity 
of  the  result  depends  on  considerable  'averaging*  occurring  in  a large  program. 


* This  mod^  has  recently  been  extended  to  include  error  generation  terms.  (See  Ref.  34. ) 
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5.  2 Basic  Assumptions 

We  assume  that  operational  software  errors  occur  due  to  the  occasional  traversing  of  a portion  of 
the  program  in  which  a hidden  software  bug  is  lurking.  We  begin  by  writing  an  expression  for  the  proba- 
bility that  a bug  is  encountered  in  the  time  interval  At  after  t successful  hours  of  operation.  This  must  be 
proportional  to  the  probability  that  any  randomly  chosen  instruction  contains  a bug,  i.  e.  , the  fractional 
number  of  remaining  bugs  e(r). 

9 

From  a study  of  basic  probability  and  reliability  theory,  we  learn  that  the  probability  of  failure  in 
time  interval  t to  t + At  given  that  no  failures  have  occurred  up  till  time  At  is  proportional  to  the  failure 
rate  (hazard  function  z(t)). 


P(t  ^ t^  < i + At  1 1^  > £) 


z(t)At  = Kc  (r)At 


(5-1) 


where  = operating  time  to  failure,  (occurrence  of  a software  error) 


P(t  ^t 


f A 


t + 


At|t^  > t)  = probability  of  failure  in  Interval  At,  given  no  previous  failure. 


K = an  arbitrary  constant* 

Note  that  in  Eq.  5-1  two  time  variables  appear:  first  there  is  t the  operating  time  in  hours  of  the  system 
and  second  there  is  t the  debugging  time  in  months  (or  more  generally,  the  debugging  resource  variable). 
Once  the  assumptions  in  Eq.  5-1  have  been  made,  the  reliability  and  mean  time  to  failure  functions  follow 
directly. 

5.  3 Reliability  Model 

By  combining  Eqs,  5-1  and  3-7  and  assuming  that  K and  e^(r)  are  independent  of  operating  time  t, 
we  obtain  for  the  reliability  function 


R(t)  = e 


-[Kej,(T)]t 


(5-2) 


Basically  the  above  equation  states  that  the  probability  of  successful  operation  without  software  bugs 
is  an  exponential  function  of  operating  time.  When  the  system  is  first  turned  on,  t = 0 and  R(0)  = 1.  As 
operating  time  increases  the  reliability  monotonically  decreases  as  shown  in  Fig.  5-1.  We  depict  the  re- 
liability function  for  three  values  of  debugging  time,  tq  < t2»  From  this  curve  we  may  make  various 

predictions  about  the  system  reliability.  For  example,  looking  along  the  vertical  line  t = l/v  we  may  state: 

1.  If  we  spend  Tq  hours  of  debugging,  then  R(l/v)  = 0.  35 

2.  If  we  spend  hours  of  debugging,  then  R(l/v)  = 0.  50 

3.  If  we  spend  hours  of  debugging,  then  R(l/v)  = 0.  75 

5.4  MTTF  Model 

A simpler  way  to  summarize  the  results  of  the  reliability  model  is  to  compute  the  mean  time  to 
(software)  failure,  MTTF  by  substituting  Eq.  5-2  into  Eq.  3-9. 


MTTF  = 


1 

K^TF) 


(5-3) 


If  we  let  dI-^)  be  modeled  by  a constant  rate  of  error  correction  o (see  Ref.  10  for  other  models), 
then  solution  of  Eqs.  5-3  and  4-4  yields 


MTTF 


K and 


?('  - S’-) 


^T 


(5-4) 


\ here  ® = 


In  Fig.  5-2,  ? X MTTF  is  plotted  vs. 
the  last  1 / 4 of  the  debugging. 

5.  5 Other  Models 


Other  similar  models  hav'e  been  proposed  in  the  litcralurv 
hazard  function  of  the  form 


We  see  that  the  most  improvement  in  MTTF  occurs  during 


-Telinski  .ind  Mor.mda 


1 3 


propose  A 


♦ In  earlier  work  (sec  Ref.  10  or  12)  an  a tt <*mpt  w.is  made  t o .t(  hiev«-  t m nr«*  n u r 1 hv  '‘pli t tint 


K into  two  factors,  K'  an  .ir bit  rary  c on st ant . and  r„  the  i t r ii  on  pr  -u  . „ w r tt  • 


rhi*  » lahofation 


is  not  included  here  since,  to  date,  no  data  has  be 


dit .1 1 fit  <1  t > fl et  1 r t 


,tl<  ulat  I t 
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2(t)  = (i.  [N  - (i-1)]  (5-5) 

where:  ^ = Constant  of  proportionality. 

N = Total  number  of  errors  present, 
i = Number  of  errors  found  by  debugging  time 
Comparison  of  Eq.  5-5  with  Eqs.  5-1  and  4-4  shows  them  to  be  identical  if 


(5-6) 

(5-7) 

(5-8) 


Equations  5-6  and  5-7  are  merely  notational  differences  and  Eq,  5-8  is  nearly  the  same  (i.  e. , would 
be  identical  if  e^(r)  = i/l^). 

1 4 

In  another  paper  Shick  and  Wolverton  modify  Jelinski  and  Moranda's  model  and  assume  that  the 
failure  rate  is  proportional  to  the  number  of  remaining  errors  and  increases  with  operating  time  t 

Z(t)  = (t>[N  - (i-l)]t  (5-9) 


One  rationale  for  postulating  an  increase  in  z(t)  would  be  if  operation  were  viewed  as  a succession  of  differ- 
ent trials  which  gradually  closes  in  on  the  remaining  errors  (sampling  without  replacement).  However, 
one  could  argue  to  the  contrary  that  z(t)  should  decrease  with  t,  since  the  latter  errors  are  the  subtle  ones 
which  take  a long  while  to  encounter  in  operation.  The  author  believes  that  in  most  cases  of  large,  intri- 
cate, well  tested,  real-time  systems  the  hazard  will  remain  constant  once  the  initial  field  debugging  of  a 
new  release  is  finished.  The  small  number  of  subsequent  patches  generated  should  not  be  significant. 
Failure  should  be  caused  by  rare  combinations  of  input  data  and  path  traversals,  with  the  time  between 
failures  governed  by  an  exponential  distribution,  yielding  a constant  hazard.  Experimental  data  is  neces- 
sary to  choose  among  these  hypotheses. 


Other  related  reliability  and  error  models  are  discussed  in  Refs.  15,  16  and  17.  The  author  has 
also  devised  a micro  reliability  model  where  the  failure  rate  is  related  to  the  number  of  paths  in  the  pro- 
gram, the  path  traversal  rate,  the  path  run  time,  and  the  probability  of  error  along  the  path.  (See  Ref.  42) 
We  now  turn  in  the  next  two  sections  to  a discussion  of  how  we  can  experimentally  measure  reliability  and 
use  these  measurements  in  conjunction  with  the  models  of  this  section  to  determine  the  unknown  model 
parameters  K and  E^. 


5.  6 Experimental  Reliability  Data 

If  we  had  just  deployed  a large  hardware-software  system  for  field  use,  we  could  monitor  its  reli- 
ability by  carefully  recording  the  operating  time  and  documenting  each  failure  in  detail.  Thus,  we  could 
obtain  the  times  between  failure.  Investigation  of  each  failure  should  allow  one  to  classify  all  failures  as 
hardware,  software,  operator,  or  unknown.  If  we  segregate  the  software  times  between  failure  and  plot 
their  average  week  by  week,  we  will  have  a quantitative  measure  of  operational  software  reliability.  We 
would  expect  the  operational  MTTF  to  increase  for  the  first  month  (year,  in  some  cases)  or  so  as  soft- 
ware bugs  detected  in  service  are  removed,  then  gradually  to  level  off  to  a relatively  constant  value.  This 
is,  of  course,  an  after-the-fact  evaluation  of  the  software  design  and  does  not  allow  one  to  measure  progress 
and/or  need  for  improvement  of  the  software  design  while  it  is  under  development. 

The  earliest  stage  at  which  an  entire  system  can  be  functionally  tested  is  during  system  integration 
using  the  system  exerciser  (functional  test)  program.  If  this  test  is  performed  at  the  beginning  of  system 
integration,  the  result  will  be  a succession  of  very  short  runs  and  immediate  crashes.  Most  software  test 
personnel  would  instinctively  comment  that  this  is  as  expected  since  the  system  is  still  in  "poor  shape" 
and  such  a test  should  be  delayed  until  the  end  when  the  system  is  in  "good  shape".  A bit  of  reflection  leads 
one  to  the  conclusion  that  it  is  just  this  frequent  crashing  which  leads  to  a quantitative  assessment  of  the 
poor  initial  reliability. 

We  now  focus  on  the  test  data  and  how  it  should  be  analyzed.  The  necessary  information  which  must 
be  recorded  for  each  run  of  the  system  test  program  is  how  long  the  test  ran,  whether  an  error  occurred, 
and  if  the  error  is  a software  error.  Sufficient  dumps  and  other  documentation  must  be  recorded  for  sub- 
sequent analysis  in  order  to  segregate  errors  into  hardware,  software,  operator,  etc.  , errors.  Each  of 
the  r successful  runs  represent  Tj,  T2,,  . . . Tr  hours  of  success.  If  there  are  n total  runs,  then  each  (n-r) 
unsuccessful  run  represents  tj,  13,  . * . In-r  successful  run  hours  before  failure.  The  total  number  of  suc- 
cessful run  hours  H is  given  by 


i=l  i=l 


(5-10) 
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Failure  Rate  = X 


n-r 

H 


(5-11) 


The  MTTF  for  a constant  failure  rate  is  the  reciprocal  (see  Eq.  3-12)  of  the  failure  rate 


MTTF  = i 


(5-12) 


Now  Eqs.  5-11  and  5-12  represent  the  total  system  failure  rate  and  MTTF.  Since  we  are  mainly  interested 
in  software  failures,  we  assume  that  the  outputs  as  well  as  dumps  are  carefully  investigated  for  the  (n-r)=x 
failures.  Based  on  the  above  analysis,  the  failures  are  divided  into  xj,  hardware  failures,  Xg  softwar^e 
failures,  Xq  operator  failures,  and  Xu  unknown  failures.  Hopefully,  the  unknown  ratio  x^/x  will  be  257o  or 
smaller  so  that  most  of  the  data  is  classifiable. 

Then  the  software  failure  rate  and  MTTF  are  defined  by 


(5-13) 


MTTF  = — 
s x^ 


(5-14) 


Thus,  based  on  the  results  on  this  occasional  test  we  can  plot  Xg  MTTFg  vs.  t the  debugging  time. 
Such  charts  should  allow  a quantitative  measure  of  the  progress  in  improving  software  quality.  Thus, 
after  t,  hours  of  debugging  we  would  have  a measure  of  MTTF  and  R(t)  and  by  extrapolation  of  the  curves 
we  could  predict  MTTF  and  R(t)  after  rb  > ta  months  of  debugging.  Unless  we  knew  the  functional  form 
of  the  variations  in  R(t)  and  MTTF  with  T and  could  determine  an  appropriate  extrapolation  scheme,  accu- 
rate predictions  would  be  limited  to  small  excursions  into  the  future. 

5.  7 Estimation  of  Model  Constants 

Rather  than  use  the  raw  experimental  data  and  extrapolation  for  prediction,  we  can  assume  an 
underlying  model  for  X,  and  MTTFg  and  use  the  data  to  estimate  the  model  parameters.  U the  hypothe- 
sized model  is  correct,  then  predictions  using  this  technique  should  be  superior  to  the  extrapolation  tech- 
nique of  the  previous  section. 

For  the  software  reliability  model  defined  in  Sections  5-2  and  5-3 


R(t,  t)  = exp 


(5-15) 


MTTF(t) 


1 


(5-16) 


Note  that  if  we  assume  a known  program  size  and  careful  collection  of  error  data,  then  and 
e (t)  are  known  values  and  only  the  constants  K and  Ej  remain  to  be  determined.  These  two  unknowns  K 
and  Ex  can  be  evaluated  by  running  a functional  test  after  two  different  debugging  times,  Tj  <r  chosen 
so  that  Sj,(tj)  < then  equate  Eqs.  5-14  and  5-16  at  times  tj  and 


H 


1 


X 

«1 


H 


2 


1 


1 

1 

■^T 

- «c<'^2> 

T 

, 

(5-17) 


(5-18) 


Taking  the  ratio  of  Eq.  5-17  to  5-18  and  using  Eq.  5-14  yields 


. >T  K'’l> 

Once  has  been  computed  from  Eq.  5-19i  we  obtain  K by  substituting 
yields 


Eq.  5-19  into  5-17  which 


(5-19) 


(5-^0) 


7-i: 


K=,^  A(Et/It)  -ecf'^r 


The  "hats'^  above  and  K in  Eqs.  5-19  and  5-20  denote  estimates  of  the  parajr:eler.  Note  that  if  tfiere 
was  no  debugging  between  "■  and  so  that  e (t  ) = e (t  ),  the  numerator  of  Eq.  5-19  becoinos  zero,  i.  e.  , 
Eqs.  5-17  and  5-18  are  no  longer  iftdependenrana  the  esnmatc  fails. 

Further  discussions  of  this  parameter  estimation  technique,  the  more  powerful  niaximum  likelihood 
estimation  technique,  and  accuracy  questions  appear  in  Ref.  18. 

6.0  ALLIED  AREAS 


This  paper  has  concentrated  on  one  aspect  of  software  reliability,  its  measurement  theoretically 
and  experimentally.  There  are  many  other  allied  areas  relating  to  this  subject:  system  recovery  tech- 
niques program  desi|m  for  low  error  content  production  of  standard  computalioncil  progr.ims 

which  are  very  reliable  structural  complexity,  ' ^etc. 


An  historical  account  of  the  SAGE  air  defense  system,  one  of  the  first  real  time  hardware-softA-are 
systems  is  worth  reading  Other  material  of  interest  appears  in  the  Record  pf^  the  197  3 IEEE  S>mp9s;uni 

on  Computer  Software  Reliability  the  Proceedings  of  the  Brown  Symposium,  ^ the  MATO  Conference  on 
Software  Engineering  the  book  edited  by  Rustin  , the  Proceedings-of  recent  conferences  on  Software 
Reliability  (Ref.  33,  35,  36,  46,  47,  48)  and  a newly  published  journal  More  material  is  appearing 

each  month  in  the  computer  research  journals  on  this  dynamic  new  field. 
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TABLE  l-l 


TABLE  3-3 


Number  of  Computer  lntt«Uations  in  the  United  States 
(actual  and  estimated*) 


Total  Installations 


Year 

Source  A 

Source  B 

1945 

1 

- 

1950 

20 

10-15 

1955 

l.OOO 

1.000 

1960 

6. 000 

6. 000 

1965 

30.000 

30.000 

1970 

60.000* 

50,000* 

1975 

85.000* 

80.000* 

Source  A.  Ref.  1 . p.  29 
Source  B.  Ref.  2.  p.  492 


Mean  Time 

Syatem  Between  Crashes 

1.  MIT  MULTICS  j-9  hours 

Time  Shared  System 

(Under  Continuous  Development) 

2.  Honeywell  MULTICS  Very  much  greater 

(Stable  Version  of  Multics) 

3.  A data  acquisition  program  30 

(see  Ref.  7) 

4.  A commercial  time  sharing  SO 

company  running  CALL  360  on  a 

360/50 

5.  A commercial  time  sharing  500 

Company  running  a business 

information  system  on  a mini' 
computer  with  many  attached 
disk  files. 


t 


TABLE  l-^  TABLE  4-1 

Hardware  and  Software  Failure  Rates  (see  Ref.  7)  Computation  of  Model  Constants  from  the  Data  of  Ref.  8 


H 

Failure  of 
r CPU 

Failures 

7 

Failure  rate 
4. 08x10'^ 

% of  Total 
12.  5 

Program 

Size 

Ej/l^ 

PQ 

A 

R 

D 

Memory 
Disk  pack 

12 

1 

7.00  21.4 

.59  1.8 

Supervisory  A 

210  K 

6.  Ux  10'^ 

0. 

875  X 10 

w \ 

Fixed  head  disk 

2 

i.  18 

3.  6 

Supervisory  B 

240 

7.  97 

0. 

996 

A 1 
R 1 

Powo  r 

3 

1.  77 

5.4 

Supervisory  C 

2 30 

7.  48 

1. 

25 

E ! 

^ Misc. 

4 

2.32 

7.  1 

Application  A 

240 

1 3.  20 

2. 

20 

Soft- 

r  operating  system 

19 

11.  1 

34.  0 

Application  B 

240 

7.  70 

1. 

54 

ware 

^ Applications  prog. 

^ 

4.6  14.0 

Application  C 

240 

7.  00 

i. 

00 

Totals 

56 

32.8x10'^  100  "h 

Application  D 

240 

12.  90 

0. 

995 

Average 8.  92 1.  26 


TABLE  1>3 

Distribution  of  Effort  by  Programming 
Phase  for  Various  Projects  (see  Ref.  6) 


Analysis 

and 

Design 

Coding 

and 

Auditing 

Checkout 

and 

Test 

SAGE 

39% 

14% 

47% 

NTDS 

30 

20 

50 

GEMINI 

36 

17 

47 

SATURN  V 

32 

24 

44 

os/ 360 

33 

17 

50 

TRW  Survey 

46 

1 20 

34 

TABLE  4-2  I 

Computation  of  Model  Constants  from  the  Data  of  Ref.  11  1 


Propram 

Size 

e^/It 

"o 

MA 

4.  03K 

25. 4x  10'^ 

2.54x  10 

MB 

1.  32 

13.  7 

1.  37 

MC 

5.45 

17.  1 

1.  71 

MD 

1.  67 

15.  6 

1.56 

ME 

2.  05 

34.  6 

3.46 

MF 

2.51 

14.  7 

1.47 

MT 

2.  10 

12.4 

1.  24 

MG 

0.  70 

22.  9 

2.  29 

MH 

3.  79 

13.  2 

1.  32 

MX 

3.  41 

23.4 

2.  34 

Average  . . . . 

. 19.  3 

. 1.93 

r 


1955 


1970 


1985 


PERCENT  UTILIZATION  OF  SPEED  AND  MEMORY 


Year 


CAPACITY 


Fig.  1-1.  Ratio  of  hardware  to  software  expend!-  Fig.  1-3.  Effect  on  cost  (and  reliability?  ) of  high 

tures  in  Air  Force  computers.  Data  % memory  utilization.  See  Ref.  6. 

and  projections.  See  Ref.  6. 


(I 


Mochine  instructions 


Primarily 

mochine 

language 


Primarily 
FORTRAN, 
JOVIAL, etc. 


Structured 

programmng 

approaches 


Fig.  1-2.  Programming  productivity  per  man 
month.  See  Ref.  6. 


TIMEdN  MONTHS) 

Fig.  Typical  behavior  of  a compiler  program. 


/0(r)-CHANGES/INST.  *10" 


r 
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T-MOMTHS 
SUPERVISORY  -A 


SUPERVISORY  8 


€(T) 


SUPERVISORY  C 


Fig.  4-2.  Normalized  error  rate  versus  debugging  Fig.  4-4.  Cumulative  error  curve  for  supervisory 
time  for  three  supervisory  programs.  system  A given  in  Fig.  4-2. 


■EtAt 


o (t)  = changes/ Inst  X 10’ 


Appl.  A 


ERRORS  REMAINING 
ERRORS  CORRECTED 


t-months  of  debugging 

(a)  Approaching  equilibrium,  horizontal  asymptote, 
no  generation  of  new  errors. 


(b)  Approaching  equilibrium,  generation  rate  of 
new  errors  equals  error  removal  rate. 


Appl.  D 


Fig.  4-3.  Normalized  error  rate  versus  debugging 
time  for  four  applications  programs. 


(c)  Diverging  process,  generation  rate  of  new 
errors  exceeds  error  removal  rate. 


Fig.  4-5.  Cumulative  errors  debugged  versus 
months  of  debugging. 
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Software  Integrity  Through  Visibility 

G Belcher  and  T Egan,  Flight  Controls  Division,  ilarconi-Elliott  Avionic  Systems  Ltd, 

Rochester,  Kent  ME!  2XX,  England 


SUMMARY 

This  paper  discusses  visibility  in  the  construction,  testing  and  safety  analysis  of  flight 
control  systems.  A computer  readable  single  source  software  specification  which  allows 
information  to  be  retrieved  in  a more  automated  manner  is  used  to  improve  the  visibility. 

A test  structure  that  can  display  a complete  correlation  between  control  functions  as 
specified  and  as  implemented  also  increases  the  visibility  of  the  test  results.  A data 
flow  analysis  based  on  a complete  cross  reference  and  symbol  table  is  used  as  the  basis 
of  a method  for  increasing  the  visibility  of  the  dormant  and  common  mode  failure  analysis. 


INTRODUCTION 


Digital  flight  control  systems  which  use  software  to  sequence  and  perform  the  control 
functions  in  real  time  require  that  this  software  be  constructed,  tested  and  analysed  with 
at  least  the  same  integrity  as  the  hardware.  A major  task  in  the  development  of  this 
software  is  the  production  of  sufficient  information  about  the  software  from  the  con- 
structing, testing  and  analysis  to  satisfy  high  integrity  requirements.  The  software  is 
defined  by  an  ordered  set  of  alphanumeric  characters.  This  set  of  characters  is  used  in 
various  ways  to  produce  an  equally  ordered  set  of  numbers  which  are  loaded  into  the 
flight  control  computer  so  that  the  correct  sequence  of  flight  control  functions  is 
performed . 

The  process  for  producing  the  software  for  flight  control  applications  must  be  configured 
and  structured  in  such  a manner  that  no  information  is  lost  or  distorted  whilst  performing 
the  transformation.  In  order  to  show  that  the  information  content  of  the  software  has  not 
been  distorted  or  lost  in  any  way,  the  software  production  processes  must  display  what 
is  happening  to  tl.e  set  of  characters  as  they  move  through  the  various  stages  of  change. 

As  each  process  is  completed, evidence  must  be  presented  to  show  that  the  functional  and 
safety  require,  .cnt s are  still  satisfied.  It  is  the  visibility  provided  by  the  present- 
ation of  this  evidence  that  this  paper  will  discuss.  It  is  to  be  noted  that  the 
visibility  referred  to  here  means  not  only  formal  documentation  but  more  importantly 
methods  for  proving  the  correctness  and  safety  of  the  flight  control  software. 

The  presentation  of  information  will  be  through  line  printer  listings,  visual  display  units 
and  graph  plotters.  This  note  identifies  the  information  required  in  order  to  show  that 
the  flight  control  software  has  preserved  its  integrity  and  specifies  the  visibility 
requirements  for  this  information  when  it  is  displayed. 

For  the  purposes  of  this  note  the  software  is  considered  to  be  constructed  using  structured 
programming  techniques  with  modules  and  groups  of  modules  linked  by  a data  module  into 
segments.  The  module  groupings  or  segments  are  required  so  that  the  various  control 
functions  may  be  called  by  the  executive  module  at  differing  iteration  rates,  e.g  Pitch 
Control  Law  functions  at  50H2,  gain  schedules  at  lOHz,  and  structural  filters  at  lOOHz. 

The  complete  digital  flight  control  system  is  considered  to  be  multichannel  with  similar 
hardware  redundancy  and  similar  software  in  each  channel  with  a channel  identification  j 

flag  providing,  where  necessary,  run-time  software  dissimilarity.  The  hardware/software  J 

complexity  is  such  that  there  are  some  200  uniquely  identifiable  modules,  averaging  50  | 

program  store  locations,  each  with  approximately  5 inputs.  The  assembler  symbol  table  j 

for  this  software  is  required  to  hold  3000  six  character  data  and  program  identifiers  1 

which  are  sorted  according  to  symbol  name  and  symbol  value  and  stored  on  separate  files.  1 

Symbol  tapes  are  specified,  by  the  assembler  as  necessary,  for  use  with  the  software  ’ 

cross-reference  table.  The  software  is  produced  using  a macro  expand  facility,  which 
allows  control  functions  to  be  uniquely  defined  and  then  used,  where  specified  throughout 
the  system.  The  maintenance  of  the  high  integrity,  required  for  this  type  of  flight  ! 

control  software,  through  visibility  during  the  construction,  testing  and  safety  analysis  . 

processes  applied  as  the  software  passes  through  the  software  production  facility  will  ^ 

now  be  discussed  in  more  detail.  ^ 


1 . SOFTWARE  CONSTRUCTION  j 

1 . 1 Documentation  j 


The  software  documentation  is  the  medium  used  for  controlling  the  progress  of  each  | 

module.  Software  design  requirements  documents  are  needed  at  the  beginning  of  a project  ! 

so  that  the  software  specification  and  structure  can  be  identified.  Once  the  specification  ) 

and  structure  of  the  software  have  been  defined,  the  specification  and  structure  documents 

are  used  to  identify  and  form  the  data  structure  (data  module)  that  will  be  used  by  the  ) 

sequence  controller  (executive  module)  to  call  the  various  progrcim  segments,  as  and  when  | 

required,  to  perform  the  various  control  functions.  The  individual  modules  will  also  ^ 


refer  to  the  above  documents  for  their  data  and  functional  structures.  Apart  from  any 
system  or  block  diagrcuns,  these  documents  are  recorded  in  the  form  of  alphanumeric 
character  strings.  It  is  these  documents  that  are  to  be  used  by  the  coder  to  produce, 


so 

with  the  help  of  an  assembler,  the  numbers  that  are  to  be  loaded  into  the  aircraft 
computers.  A record  of  these  numbers  (code)  produced  will  also  form  part  ;'f  tlie 
constructional  documentation  for  the  software. 

In  order  to  preserve  the  integrity  of  the  flight  control  software  it  is  essential  t.hat 
this  documentation  be  produced  at  the  beginning  of  the  softviare  asse.mLly  and  bo  dynamic 
enough  to  be  able  to  control  the  production  of  the  modules. 

The  visibility  requirement  for  the  documentation  that  will  -.-ontroi  the  software  module 
through  its  constructional  processes  may  be  met  using  documents  which  can  be  tcirnatted 
to  be  computer  readable.  The  formatting  required  is  not  of  a complicated  nature,  it 
consists  of  various  taoles  and  operating  system  commands. 

In  this  way  the  total  information  content  of  the  software  specification  and  structure  is 
stored  in  a disciplined  and  controllable  manner.  The  process  of  rualting  t!iC  documents 
computer  readable  will  in  itself  improve  not  only  the  visibility  but  also  the  integrity 
of  the  final  software  produced. 

1.2  Configuration  Control 

The  m.ajor  functions  of  the  configuration  control  system  are  tlie  control  oi  a)  the 
information  as  specified,  b)  the  information  as  implemented,  c)  the  cnanges  to  a)  and/or 
b) , and  d)  the  project  schedule. 

The  specification  information  concerning  the  flight  control  software  is  parameters  such 
as  module/segment  identification,  input/output  lists,  size/runtime,  and  functional 
running  order.  The  implementation  information  is  as  above  but  with  the  added  paran.eters 
concerning  the  coding,  testing,  integration  and  validation  of  the  modules/ segments . The 
change  information  is  used  to  control  all  interaction  effects  on  the  other  modules/segments. 

For  project  management  purposes  the  visibility  requirement  is  for  a dyna.mic  display  of 
the  correlation  of  budgetted  with  actual  progress  so  that  the  differences  arc  easily 
found.  The  display  is  usually  in  the  form  of  build  status  reports.  T.he  use  of  an 
automated  data  base  system  gives  up  to  date  details  of  budget/actual  storage /runtimes , 
module  status,  work  in  progress/outstanding,  change  status  and  scheduling  information. 

This  information  is  obtained  by  using  the  host  computer  to  continually  .search  for 
differences  between  the  flight  control  functions  as  specified  a.nd  as  impiemented . The 
configuration  control  system  is  so  structured  that  when  a change,  either  to  the  specifi- 
cation or  to  the  implementation  information,  is  requested  then  a search  of  the  complete 
data  base  is  possible.  This  search  will  itemise  the  total  effect  of  the  change  on  the 
flight  control  software,  and  will  reschedule  the  work  accordingly.  Op  to  date  change 
status  is  given  with  its  corresponding  impact  on  the  software  construction  schedule. 

Any  potential  overruns  of  storage  and  runtime  requirements  are  noted  in  the  early  stages 
of  the  project.  Hence,  the  loss  of  integrity,  which  might  occur  due  to  the  deletion  of 
control  functions  to  conserve  the  resources  of  the  flig.ht  control  system  will  be  avoided. 

It  is  to  be  noted  that  full  character  sum  checks  are  required  in  the  ir,.plementation  of 
the  automation  of  the  configuration  control  facility.  Integrity  based  on  character  files 
which  are  to  be  edited  and  continually  updated  means  that  the  system  must  be  protected 
at  all  stages  against  loss  of  characters  which  will  distort  the  infom.ation  content  of 
the  flight  control  software  specification  and  structure. 

A further  improvement  in  the  integrity  can  be  obtained  by  using  the  flight  control  soft- 
ware configuration  control  procedures  on  the  software  required  to  implement  these 
procedures . 

1.3  High  Level  Languages 

A major  task  when  specifying  and  structuring  a flight  control  function  or  sequence  of 
functions  is  to  give  the  full  visibility  required  so  that  the  specification  and  structure 
can  be  analysed  in  order  to  validate  that  it  is  safe.  By  using  a high  level  language  at 
the  specification  and  structuring  stages  of  a project,  an  improvement  of  the  visibility  in 
and  the  integrity  of  the  flight  control  software  constructional  processes  can  be  made. 

The  functions  required  for  flight  control  are  essentially  simple  ones,  which  are  ordered 
in  a complex  but  uncomplicated  manner.  The  high  level  language  statements  will  therefore 
form  a very  simple  subset  of  those  available  to  that  language.  A standard  high  level 
language  is  preferred  in  order  to  make  use  of  the  integrity  and  visibility  already 
available  from  its  own  controls.  The  high  level  language  used  to  specify  and  structure 
the  flight  control  software  can  also  be  used  by  the  configuration  control  software,  and, 
when  required  to  be  compiled,  can  use  the  same  host  computer  compiler.  Commonality  of 
compilers  and  languages  for  a project  will,  in  themselves.  Improve  the  integrity  of  any 
software  produced  by  tnat  project. 

High  level  languages,  by  their  intrinsic  clarity  and  definition  improve  the  visibility  of 
the  software  specification.  Should  the  statements  also  be  written  with  the  use  of  GOTO 
directives  kept  to  a controlled  minimum  then  the  visibility  is  again  improved  as  far  as 
a validation  procedure  is  concerned.  A flight  control  software  function  specified  by 
simple  high  level  language  statements,  backed  up  with  a scaled  system  diagram,  has  higher 
visibility  and  integrity  than  a flcv/  chart  used  for  the  same  purpose.  The  discipline 
requiiod  to  write  the  high  level  language  statements  and  the  potential  they  have  for  host 
computer  processing  make  them  suitable  as  the  media  for  flight  control  software  specifi- 
cation and  structuring.  Flow  charts,  where  essential  to  provide  informaticn  when 
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validating  the  flight  control  software,  are  to  lie  added  as  processing  notes  only. 

The  improved  visibility  at  the  software  specification  level  has  also  the  potential  for 
an  improved  translation  by  the  coder  into  the  numbers  which  are  to  be  loaded  into  the 
aircraft  computers. 

Various  groups  of  engineers,  witn  different  backgrounds,  are  required  to  perform  certain 
tests  and  validation  on  the  integrated  software/hardware  flight  control  system.  The  use 
of  a high  level  language  as  the  software  specification  gives  the  project  a single  source 
for  all  the  information  regarding  the  flight  control  software.  This  single  source  will 
allow  the  various  groups  to  work  through  a common  language  while  they,  as  individual 
groups,  approach  the  testing  and  analysis  from  different  directions.  This  commonality 
of  language  will  make  the  results  of  the  various  groups'  analysis  more  visible.  This 
single  source  is  also  used  for  configuration  control  to  further  correlate  the  information 
as  specified  with  the  information  as  implemented. 

1.4  Proof  of  Correct  Software  Construction 

AS  the  software  specification  moves  through  the  construction  stages  and  is  gradully 
changed  into  the  numbers  that  are  loaded  into  the  aircraft  computer,  certain  information 
is  presented  to  the  configuration  control  group  so  that  the  "static"  checks,  as  discussed 
in  the  previous  sections,  can  be  done.  These  checks  take  the  form  of  a continual 
correlation  between  the  specification  and  its  implementation  with  any  differences  high- 
lighted. The  most  rigorous  method  is  to  take  the  numbers  from  the  aircraft  computer  and 
process  them  so  that  the  original  specification  is  produced.  The  same  correlation  is 
performed  along  the  backward  path  as  is  done  on  the  forward  path.  For  most  of  the 
forward  constructional  processes,  with  their  associated  correlations , there  are  inverses 
so  that  the  equivalent  reverse  constructional  processes,  with  their  own  associated 
correlation,  can  be  performed.  By  structuring  the  software  and  its  production  processes 
so  that  these  inverses  are  available,  it  is  possible  to  provide  further  proof  that  the 
software,  as  specified,  has  been  implemented  correctly.  In  addition  to  showing  that  the 
software  is  constructed  correctly  (partially  by  performing  these  "static"  checks)  the 
specification  and  the  structure  itself  must  be  tested  to  prove  its  correctness.  This 
form  of  testing  takes  place  at  execution  time,  and  is  discussed  in  the  following  sections. 

2.  SOFTWARE  TESTING 

2.1  Test  Documentation 

The  testing  that  will  oe  discussed  in  this  section  refers  to  those  tests  that  are  to  be 
performed  on  the  flight  control  software  before  it  is  loaded  into  the  aircraft  computers. 
The  documentation  controlling  the  testing  of  the  software,  as  that  which  controls  the 
construction  of  the  software,  should  provide  the  maximum  visibility  at  each  stage  in  the 
test  procedure.  Improved  visibility  and  integrity  can  be  achieved  by  making  the  test 
specification  and  structure  documents  computer  readable  so  that  correlation  between 
expected  and  obtained  results  is  facilitated.  The  tests  that  are  performed  on  the  soft- 
ware after  the  construction  and  before  the  loading  onto  the  aircraft  computers,  are  in 
general,  dynamic  tests.  To  improve  visibility  the  individual  tests  are  of  a simple 
nature  and  are  performed  using  the  binary  numbers  that  are  to  be  loaded  onto  the  aircraft 
computers.  These  same  tests  can  be  run  on  the  high-level  language  specification.  In 
this  way,  the  results  from  the  specification  and  the  binary  numbers  representing  that 
specif ication  can  be  correlated  and  any  differences  highlighted  by  the  host  computer. 

Configuration  control  is  applied  to  the  testing  as  well  as  the  construction  of  the 
software.  A major  feature  of  the  automated  construction  of  the  software  is  the  data 
module  or  symbol  table.  This  file  records  the  numerical  correspondence  for  every  label 
or  data  identifier  required  by  the  flight  control  software  and  can  be  used  to  improve 
the  visibility,  not  only  of  the  software  construction,  but  also  of  the  software  testing. 
The  symbol  taole  file  is  available  when  defining  the  test  structure,  so  that  all  test 
specifications  can  identify  the  input  and  output  values  by  their  software  names,  instead 
of  only  their  numerical  or  address  representation.  This  improves  the  visibility  in  the 
test  results  checking  procedure  and  hence  helps  to  preserve  the  integrity  of  the  flight 
control  software. 

Certain  integration  tests  have  to  be  performed  on  the  software  modules/segments  before 
any  execution  tests  can  be  done  to  check  their  system  performance.  The  integration  tests 
provide,  for  validation,  such  information  as  input/output  lists,  module/segment  size/ 
maximum  runtime  and  module/segment  identification.  This  information  is  correlated  with 
the  corresponding  information  as  specified  in  the  structure  document  and  the  differences, 
if  any,  are  displayed.  The  feedback  for  this  cross  reference  process  is  supplied  by  the 
symbol  table.  This  cross  referencing  information,  usually  presented  in  the  form  of  a 
cross  reference  table,  is  often  provided  by  the  assembler  program.  For  improved 
visibility  this  cross  reference  table  can  be  uroduced  from  the  binary  file  that  is  loaded 
into  the  aircraft  computers.  Thus  it  will  be  a feature  of  the  software  sitiiulator, 
as  well  as  the  assembler.  This  cross  reference  table  is  available  tor  the  safety 
analysis  that  is  to  be  performed  on  the  flight  control  software  structure. 

2.2  Performance  Tests 


The  following  limited  set  of  examples  illustrate  the  visibility  that  can  be  obtained  when 
testing  software  on  a host  computer  prior  to  the  loading  of  the  software  modules/segments 
into  the  aircraft  computers. 


i)  Overflow  tests 
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All  combinations  of  maximum  and  minimum  scaled  values  of  the  inputs  are  used  to  test  the 
raodule/segment  for  overflow.  All  modules  followed  by  a limit  function  have  their  outputs 
correlated  with  that  of  the  limiting  function.  It  is  a requirement  that  the  test 
structure  be  able  to  do  this  correlation  and  present  the  information  in  the  forns  of  a 
comparison  between  the  module  outputs  and  the  specified  limiting  function.  Should  the 
limiting  function  not  be  present  in  the  code  then  the  test  structure  is  to  highlight  this 
as  it  formats  the  above  information.  For  improved  visibility  and  integrity  the  complete 
module/lirait  function  information  must  be  available  when  testing  the  module  for  overflow. 
Similar  information  is  also  required  for  signal  selector  and  monitor  threshold  tests. 

ii)  Decision  path  tests 

Should  a module  contain  more  than  one  process  path  then  the  test  structure  is  to  be  able 
to  snow,  by  a trace  feature,  that  all  the  paths  are  not  only  there  but  are  capable  of 
being  followed!  The  number  of  paths  must  correspond  exactly  to  the  number  specified.  Any 
discrepancy  is  to  be  highlighted.  For  increased  visibility  the  decisions  within  a module 
are  to  be  in  the  form  of  a "true/false"  structure.  When  the  decision  is  true,  a certain 
function  is  performed,  when  the  decision  is  false,  this  function  is  bypassed. 

It  is  suggested  that  only  forward  pointing  GOTO  directives  are  used  to  improve  the  storage 
and  runtime  required  by  the  module.  This  will  preserve  the  simple  and  hence  visible 
structure  of  the  decision  without  adversely  affecting  the  complete  flight  control  structure 
by  using  too  much  program  store  and  iteration  period. 

iii)  Self  contained  function  tests 

When  testing  the  function  of  a module,  improved  visibility  can  be  obtained  by  structuring 
the  flight  control  software  such  that  each  module  only  performs  a single  simple  function. 

This  function  should  be  able  to  be  performed  and  tested  without  using  any  other  function 
or  module  available  in  the  software.  This  avoids  the  need  to  retest  a module  every  time 
another  module  is  changed.  For  improved  visibility  the  inputs  to  a module  should  be 
available  at  the  time  the  module  is  entered,  and  should  not  be  required  to  be  calculated 
during  the  execution  of  that  module.  In  this  way,  the  values  for  the  inputs  can  be 
examined  before  and  after  running  the  module  and  can,  therefore,  be  shown  to  have  been 
calculated  independently  of  any  other  software  function.  This  type  of  software  structure 
also  improves  the  visibility  from  the  safety  and  data  flow  analysis  viewpoint  because 
the  only  way  that  a module  can  affect  itself  and  other  modules  is  via  its  data  outputs. 

iv)  Macro  and  subroutine  call  tests 

Macro  expansions  can  be  used  with  advantage  for  flight  control  software  to  call  standard 
control  functions  and  to  load  subroutines  parameters.  These  standard  functions  can  be 
tested  separately  from  the  modules  that  call  them,  while  the  calling  module  has  to  be 
tested  for  the  correct  positioning  of  the  macro  call  and  for  the  correct  nximber  and  naming 
of  any  parameters.  The  use  of  macros  also  increase  the  visibility  of  the  software 
specification  by  giving  a one  to  one  correspondence  between  the  name  of  the  macro  and  the 
function  that  it  performs.  The  running  order  for  the  control  functions  is  also  made  more 
visible  and  can  be  checked  against  the  system  diagram  with  increased  integrity. 

The  macro  as  specified  can  he  correlated  with  the  macro  as  implemented  in  the  symbolic 
assembler  language.  The  number  of  actual  parameters,  and  the  running  order  can  be 
compared  with  those  as  specified  and  differences  displayed.  In  order  to  check  that  the 
macro  as  specified  is  on  the  binary  file  a substitution  module  is  used.  This  module  is 
designed  with  reference  to  the  specification  for  the  macro  and  not  with  reference  to  the 
calling  module.  The  substitution  module  will  improve  the  visibility  of  t.he  macro  as 
called  by  printing,  at  sii..ulation  time,  the  contents  of  each  parameter,  together  with 
the  macro  identification  and  position  in  the  running  order. 

To  improve  the  visibility  no  macro  definition  should  contain  a call  to  a subroutine  and 
all  subroutine  parameters  should  be  loaded  before  and  off  loaded  after  the  subroutine 
call.  The  software  specification  and  the  binary  file  are  both  checked  for  correct  use 
of  subroutines.  Further,  all  calls  to  subroutines  in  the  flight  control  software  are 
documented  and  labelled  so  that  they  can  be  easily  identified  during  the  safety  analysis. 

2.3  Random  Input  Testing 

The  tests  described  above  are  structured  so  that  the  visibility  of  the  software 
constructional  process  is  improved.  This  test  structure  presents  ample  evidence  that  the 
binary  tile  is  a correct  representation  of  the  software  as  specified. 

The  prime  purpose  of  the  tests  on  the  host  computer  is  to  present  evidence,  in  the  most 
visible  form,  that  tne  binary  tile  will  execute  the  flight  control  functions  as  specified. 
When  the  binary  file  is  tested  using  input  numbers  rather  than  by  examination  of  the  code, 
the  numbers  are  to  be  a simple  set,  chosen  to  confirm  that  the  code  is  correctly  representing 
the  software  specification  at  execution  time.  It  is  only  when  this  correct  representation 
at  execution  time  has  been  confirmed  that  the  modules/segments  can  be  released  for  further 
testing . 
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When  testing  on  the  host  computer,  visibility  is  preserved  by  limiting  the  tests  to 
aspects  such  as  simple  dynamic  and  end-to-end  resolution  tests.  More  complex  tests 
such  as  random  inputs,  are  best  performed  on  a hybrid  computer  or  on  a system  rig 
since  over  complex  tests  on  a host  computer  may  decrease  the  visibility  rather  than 
increase  it.  Thus  when  testing  the  flight  control  software  on  the  host  computer, 
particular  combinations  of  inputs  are  preferred  to  random  inputs  so  that  the  test  results 
can  be  presented  in  a simple  and  easily  understood  manner.  All  tests  are  to  be  structured 
so  that  their  results  can  be  used,  without  ambiguity,  for  the  safety  analysis  to  be 
performed  on  the  flight  control  software. 

3.  SAFETY  ANALYSIS 

3.1  Configuration  Control  Examination 

The  Failure  Modes  Error  Analysis  (FMEA)  for  the  flight  control  software  must  examine  the 
software  preparation  proceaures  for  the  type  of  errors  that  can  occur  and  the  methods 
used  to  remove  these  errors.  The  test  procedure  must  be  examined  to  determine  the  type 
of  errors  detected  by  the  various  tests.  The  sources  of  the  test  data  sets  must  be 
examined  for  possible  errors  and  the  effect  of  these  errors  on  the  whole  test  structure 
must  be  analysed.  The  simple  dynamic  tests  performed  on  a complete  control  law  function 
must  oe  checked  to  see  that  they  exercise  all  the  paths  through  the  function.  If  any 
paths  are  found  to  be  untested  at  this  stage  then  the  function  must  be  retested  to 
exercise  them.  The  authorship  of  the  test  specifications  must  be  checked  to  ensure  that 
the  same  programmer  did  not  design  a module/segment  and  test  the  same  module/segment. 

The  assembler  errors  produced  while  integrating  the  modules  must  be  examined  for  weak- 
nesses in  the  preparation  procedures. 

The  Hardware/Software  test  procedure  must  be  examined  to  show  that  the  tests  that  were 
too  complicated  (and  hence  not  sufficiently  visible)  to  perform  on  the  host  computer 
have  been  included  in  this  type  of  testing.  The  configuration  control  procedures  for 
implementing  software  change  requests  and  for  reworking  modules  must  be  examined  to 
ensure  that  the  correct  issue  of  the  module  is  incorporated  into  the  software  correctly. 
The  procedures  for  correcting  and  redesigning  the  function  of  a module  must  be  examined 
to  make  sure  that  the  code  is  completely  retested  before  being  integrated  with  the  flight 
software.  The  complete  test  plan  for  the  flight  control  software  must  be  examined  to 
snow  that  the  objective  of  reducing  the  untested  code  to  zero  has  been  achieved. 

The  history  of  the  design  and  coding  errors  must  be  examined  in  order  to  establish  that 
the  change  procedure  gives  the  correct  feedback  and  that  weaknesses  in  the  testing 
methods  discovered  are  corrected  and  the  modules  affected  are  retested.  The  analysis 
of  the  test  procedures  must  show  that  each  test  will  be  successful  in  uncovering  the 
type  of  error  for  which  it  was  designed.  The  software  specification  must  be  reviewed 
as  the  module  test  results  become  available.  This  review  will  require  a FMEA  to  be 
done  on  the  module  at  the  time  that  it  is  tested.  In  order  to  improve  the  visibility 
and  integrity  of  the  complete  software  F.MEA  it  is  recommended  that  each  type  of  test' 
has  its  own  associated  failure  modes  and  error  analysis,  which  is  to  be  done  at  the 
time  the  test  is  performed. 

3.2  Cross  Reference  and  Data  Flow  Analysis 

The  cross  reference  information  provided  by  the  software  simulator  is  obtained  from  the 
binary  file.  This  must  be  presented  in  a simple  visible  manner,  e.g  complete  lists  of 
inputs  and  outputs,  lists  of  modules  giving  their  inputs  and  outputs,  lists  giving  the 
position  of  every  subroutine  call.  These  lists  will  also  give  the  same  information  as 
specified,  with  the  differences  highlighted.  A more  visual  way  for  representing  this 
cross  reference  information  couid  be  in  the  form  of  a sofware  "wiring"  diagram  similar 
to  that  used  for  the  hardware.  In  addition  to  this  data  connection  display,  the  timing 
information  particular  to  a module  must  be  presented  in  a visual  form.  Data  flow 
analysis  is  an  essential  feature  of  the  software  FMEA.  All  modules  must  be  shown  to 
oe  in  the  correct  time  "slot"  for  the  flight  control  functions.  The  probiem  with  data 
flow  analysis  concerns  the  presentation  of  the  information  in  the  most  visible  form  so 
that  the  analysis  can  be  done  with  high  integrity.  It  is  suggested  that  an  interactive 
method,  involving  the  host  computer,  is  a suitable  method  of  improving  the  visibility 
of  the  data  flow  analysis.  The  system  diagram  as  specified  is  to  be  used,  together  with 
the  cross  reference  data,  during  the  interactive  process  to  enable  the  operator  to  follow 
any  data  path  from  sensor  input  to  system  output  command.  The  interaction  is  required 
so  that  the  modules  of  interest  can  be  quickly  and  efficiently  specified  and  checked  by 
the  operator.  Hard  copy  is  produced  so  that  the  flow  for  any  particular  data  item  can 
be  analysed.  Any  timing  errors  or  data  skew  are  highlighted  on  the  printout. 

3.3  Dormant  Failure  Analysis 

One  of  the  advantages  in  designing  the  flight  control  software  in  the  form  of  structured 
modules  each  performing  a simple  function  is  that  the  dormant  failure  analysis  can  be 
conducted  by  considering  that  the  malfunction  of  a r.,odule  is  due  to  the  corruption  of  the 
input  data  by  a prior  module.  The  module  under  consideration  is  assumed  to  be  working 
correctly.  Thus  a dormant  failure  is  any  fault  that  causes  an  erroneous  input  to  be 
passed  to  a module  such  that  any  errors  in  both  the  input  and  output  remain  undetected. 


The  type  of  dormant  failure  that  can  cause  one  or  more  good  channels  to  be  in  error  is 
of  particular  concern  for  the  analysis.  To  increase  confidence,  the  data  flow  analysis 
can  be  used  to  follow  the  path  of  data  from  any  particular  input  and  check  what  its 
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effect  is  and  whether  it  will  be  monitored.  The  results  of  this  analysis  are  presented 
in  such  a manner  that  it  can  be  shown  that  all  inputs  and  all  data  paths  have  been 
considered.  Failure  cases  that  are  still  suspect  after  this  analysis  are  exan.ined 
further,  with  in  some  cases,  simulation  of  the  failed  cases  on  the  nost  coeiputer . 

The  software  specification  for  the  monitor  structure  is  alSw  examineU  tor  particular 
dormant  failures.  Each  module  containing  any  monitoring  function  is  examined  for 
erroneous  inputs  and  outputs.  Again,  simulation  on  the  host  computer  will  improve  the 
visibility  of  the  results. 

3.4  Analysis  of  Discontinuities 

As  part  of  the  complete  common  failure  mode  analysis,  the  arithmetic  as  defined  in  the 
software  specification  is  studied  for  possible  discontinuities.  Typically  the  code  is 
examined  for  instructions  that  can  cause  an  overflow  condition  to  occur.  This  search 
can  be  automated  and  the  information  can  be  displayed  with  improved  visibility  by  means 
of  the  assembler  symbol  table.  Typical  information  printed  will  give  the  program 
location  at  which  the  overflow  can  occur,  together  with  the  variable  names  that  are  able 
to  contribute  to  this  overflow.  Possible  effects  of  the  overflow  are  also  to  be  analysed, 
by  means  of  the  data  flow  analysis. 

After  identifying  all  the  program  positions  at  which  overflow  can  occur,  the  range  of 
the  numbers  contained  in  the  contributing  variables  is  examined  to  identify  what 
particular  range  of  input  values  will  cause  the  overflow  to  occur.  The  data  flow 
analysis  will  also  be  used  to  follow  these  particular  module  Inputs  back  to  the  sensor 
inputs . 

CONCLUSION 

Techniques  have  been  developed  which  give  good  visibility  of  flight  control  software 
during  the  writing  of  the  specification,  the  testing  on  the  host  machine  and  in  the 
t-nalysis  in  the  corresponding  FMEA.  These  techniques  combined  with  programming  disciplines 
during  the  construction  phases  and  dissimilarly  in  the  subsequent  testing  of  the  inte- 
grated hardware/software  give  a high  confidence  that  the  similar  software  will  be  free 
from  common  mode  errors. 
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SUMMARY 


A method  for  computer-assisted  verification  of  systems  controlled  by  microcode  will  be  presented, 
with  examples  of  Its  application  to  actual  Implementations.  Such  systems  may  be  a computer  with  opera- 
tions emulated  by  microcode  action  on  a simple  processor  or  a microprocessor  coded  to  perform  a fixed 
task.  The  specifications  for  such  a design  and  those  for  the  processor  on  which  it  Is  to  be  Implemented 
are  both  described  formally,  with  the  code  to  be  certified  supplied  as  data  to  the  low-level  description. 
Informally,  correctness  of  the  Implementation  means  that  If  the  specification  description  and  the  machine 
description  begin  computation  with  Identical  inputs,  they  will  get  Identical  results.  An  Interactive 
system  of  programs  for  carrying  out  mathematical  proofs  of  a formalization  of  this  correspondence  has  been 
written.  Its  application-independent  monitor  provides  a framework  for  a goal-directed  attack  on  the  prob- 
lem, allowing  the  user  to  reduce  It  to  subproblems;  programs  are  Invoked  to  perform  symbolic  interpretation 
of  the  descriptions,  generation  of  sufficient  conditions  for  correctness,  theorem  proving,  and  simplifi- 
cation. This  system  Is  running  and  has  been  used  to  detect  and  correct  errors  In  an  actual  microcode 
Implementation;  preliminary  results  of  these  experiments  are  described. 
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I.  INTRODUCTION 


A method  for  computer-assisted  verification  of  systems  controlled  by  microcode  will  be  presented,  with 
examples  of  its  application  to  actual  system  implementations.  Such  a system  Is  a computer  with  operations 
emulated  by  microcode  action  on  a simple  processor  or  a microprocessor  coded  to  perform  a fixed  task. 

The  Increased  use  of  microcode,  residing  in  a control  store  or  in  a main  memory,  for  implementation  of 
the  control  for  such  systems  makes  a demonstration  of  the  correctness  of  the  code  a necessary  part  of  the 
design  verification  of  the  system.  Indeed,  such  microcode  may  be  more  prone  to  subtle  and  obscure  errors, 
which  are  more  difficult  to  detect  using  test  cases,  than  programs  In  a higher-level  language  [23].  But 
while  several  approaches  have  been  developed  for  proving  programs  correct  (see  [7,  12,  16,  31]),  little 
attention  has  been  given  to  the  verification  of  low-level  code.  Below  we  describe  a partially  automated 
system  which  has  been  used  to  detect  errors  in  microcode  and  to  certify  microprograms  as  correct. 

In  proving  the  correctness  of  microprogrammed  Implementations,  all  of  the  facets  of  machine  operation 
must  be  explicitly  described.  Correctness  of  the  description  and  a proof  of  the  correspondence  of  the 
assembled  code  to  a given  specification  guarantee  correct  execution.  However,  particularly  for  microcode 
implementations  of  high  level  architectures,  assertions  for  correctness  such  as  those  due  to  Floyd  [10] 
are  not  easily  formulated.  Our  approach  Is  to  give  the  specifications  for  correct  implementation  as  an 
abstract  machine  schema  (3,20)  having  a well  specified  tree  control  structure  which  operates  upon  a state 
vector  of  machine  components  and  is  determined  by  a library  of  macro  routines.  The  state  vector  components 
and  the  macros  are  written  in  the  Language  for  Symbolic  Simulation  (LSS)  whose  syntax  and  semantics  will 
be  described  subsequently.  The  attributes  of  the  computer  on  which  the  specified  architecture  Is  to  be 
implemented  are  also  embodied  in  such  an  abstract  machine,  and  the  desired  relationships  between  the  two 
machines  are  specified  and  then  established. 

The  type  of  equivalence  we  wish  to  hold  in  order  to  prove  correctness  has  been  formalized  by  Milner 
as  algebraic  simulation  between  programs  [25].  As  related  to  our  abstract  machine  descriptions,  this 
notion,  Symbolic  Validation  of  Algorithmic  Equivalence  (SVAE),  gives  a mechanism  for  describing  both  the 
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points  of  control  in  the  two  machines  at  which  a relationship  between  them  should  hold,  and  at  each  of 
these  points  the  desired  relationship  between  the  components  of  the  state  vectors.  A proof  that  such  a 
relation  is  a simulation  relation,  in  the  sense  of  Milner,  is  then  a proof  that  whenever,  and  however, 
control  in  the  two  abstract  machines  goes  from  one  pair  of  corresponding  points  to  another  pair,  these 
relationships  will  hold.  The  theorems  validating  this  approach  will  be  proved. 

Previously,  simple  examples  have  been  done  using  SVAE  [3,  19,  20]  to  prove  the  correctness  of  micro- 
code. However,  these  proofs,  throgh  formal,  were  carried  out  by  hand.  They  will  be  reviewed  briefly  so 
that  the  basic  ideas  of  algebraic  simulation  are  made  clear.  For  each  pair  of  stopping  points  correspond- 
ing to  a simulation  component  we  must:  (1)  assert  that  the  simulation  conditions  corresponding  to  that 

component  hold,  instantiate  the  most  general  values  such  that  these  conditions  hold  in  the  state  vector, 
and  perform  immediate  simpl if icat ion;  (2)  run  each  abstract  machine,  performing  symbolic  computation  until 
another  stopping  point  is  reached;  (3)  verify  that  the  pair  of  points  reached  corresponds  to  a component 
of  the  simulation  relation;  and  (4)  prove  that  the  simulation  conditions  of  this  component  hold.  It 
became  apparent  from  this  work  that  the  individual  parts  of  such  proofs  were  not  of  great  complexity,  and 
that  the  main  impediment  to  human  proofs  was  the  generation  and  organization  of  these  many  separate  parts. 
Since  this  number  of  parts  increases  with  the  size  of  the  implementation  being  verified,  it  became  clear 
that  some  automated  aid  would  be  necessary. 

An  interactive  system  of  programs  for  carrying  out  mathematical  proofs  of  a formalization  of  this 
simulation  has  been  written.  Its  application  independent  monitor  provides  a framework  for  a goal-directed 
attack  on  the  problem,  allowing  the  user  to  reduce  it  to  subproblems;  programs  are  invoked  to  perform 
symbolic  interpretation  of  the  descriptions,  generation  of  sufficient  conditions  for  correctness,  theorem 
proving,  and  simplification.  This  system,  the  Microprogram  Certification  System  (MCS)  will  be  briefly 
described.  It  is  running  and  has  been  used  to  detect  and  correct  errors  in  an  actual  microcode  implemen- 
tataion;  preliminary  results  of  these  experiments  are  described. 

1 1 . UVNGfAGE  FOR  SYHbULIC  SIMULAl'lUN' 

The  first  step  in  the  SVAE  method  is  to  define  formally  and  precisely  the  algorithm  and  its  imple- 
mentation. In  any  discussion  of  computer  design,  the  natural  levels  of  system  description  must  be 
carefully  defined.  The  architectural  level  contains  the  attributes  of  a computer  system  as  seen  by  a 
programmer;  i.e.  the  conceptual  structure  and  functional  behavior  as  distinct  from  the  organization  of 
data  flow  and  controls,  the  logical  design,  and  the  physical  implementation  [2].  Microprogramming  was 
first  proposed  as  a technique  for  the  orderly  design  of  logic  to  control  the  processor  data  flow.  This 
level  is  frequently  called  the  register-transfer  level.  For  several  systems  of  the  1960's,  such  as  the 
IBM  System/360,  simulators  were  devised  which  accepted  as  input  computer  descriptions  at  the  register- 
transfer  level,  and  instructions  for  standard  assembly  level  program  debugging  techniques  [5].  However, 
as  the  use  of  mi^_rocode  for  emulation,  diagnostics  and  special  functions  increased,  this  technique  of 
microprogram  c»=*i  lif ica tion  became  less  viable.  To  validate  a computer  design  using  microcode,  we  first 
define  formally  and  precisely  the  computer  at  both  the  architectural  and  register-transfer  levels.  The 
operation  modeled  is  the  successive  execution  of  strings  of  instructions,  beginning  with  the  START  button 

being  pushed  and  ending  with  either  a STOP  or  an  ERROR.  As  discussed  in  [18],  the  necessary  primitives 

for  such  a modeling  system  are:  (1)  a data  set,  (2)  a closed  set  of  operations  over  the  data  set,  and 

(3)  a set  of  operations  to  control  the  order  in  which  operations  on  the  data  set  are  performed.  The  data 
set  and  operations  should  be  chosen  so  that  their  application  in  specific  instances  will  result  in  imple- 
mentations which  can  be  easily  manipulated  to  produce  theorems  whose  satisfaction  will  prove  the  validity 
of  the  given  design.  These  manipulations  and  theorems  should  lend  themselves  to  mechanical  treatment  via 
computer  assistance.  The  language  in  which  we  shall  describe  the  formal  models  which  allow  validation 
to  be  proven  is  the  Language  for  Symbolic  Similation  (LSS).  The  syntax  of  LSS  will  allow  the  data  set 
and  the  set  of  operators  to  be  described.  The  LSS  semantics  will  completely  describe  the  order  in  which 
operations  on  the  data  set  are  performed. 

The  description  of  the  syntax  begins  with  a description  of  a facility  vector,  following  [24],  The 
facility  vector  is,  for  our  present  purposes,  a list  of  the  components  of  the  machine:  registers,  storage 

switches,  lines,  etc.  We  associate  with  each  of  these  a shape,  or  dimensioil,  in  the  style  of  APL.  Regis- 
ters have  associated  with  them  their  width  in  bits;  main  storage  has  the  dimensions  of  a matrix.  The 

operations  are  defined  by  a macro  library.  The  macro  library  for  each  discription  is  a list  of  macro 
definitions  for  an  abstract  machine  with  a tree  control  structure.  Each  definition  consists  of  its  name, 
formal  parameter  list,  and  either  a tree  into  which  the  macro  expands,  or  several  such  trees  whose 
selection  depends  on  predicates  over  elements  of  the  facility  vector.  These  trees  contain  other  macro 
calls  and  assignments  of  values  of  APL  expressions  [9J  or  values  returned  by  macros  to  local  variables  or 
to  elements  of  the  abstract  syntax.  The  formal  description  of  the  LSS  syntax  in  BNF  follows: 


facility-vector 

varb-desc 

permanent-value 

shape 

term 

arg 

exp 

macro-call 

assignment 

item 

tree 

root 

pred-expan 
macro-def 
param-1 1st 
macro- library 


® (varb-desc-1 . . .va»'b-desc-n) 

= (id  shape)  | (Id  shape  permanent-value) 

* APL-expression  (over  constants) 

» ordinal  | (ordinal-1 ,. .ord Inal-n) 

» facility-variable  | local-variable  I array-reference  | PASS 
= /VPL-expressiun 

* arg  I macro-call 

* (macro-name  arg-1 , . . . ,arg-n) 

» (term  ; exp) 

■ macro-call  | assignment 

* (root)  I (root  tree-1 ... tree-n) 

■ item 

■ (predicate  tree) | (predicate  pred-expan-1 . . .pred-expan-n)  {n>l} 

■ (macro-name  param-llst  tree)  | (macro-name  param-llst  pred-expan-1 ... pred-expan-n) { n> 1 ) 

■ (ld-1 . . . id-n) 

» (macro-def-1 . . .macro-def-n) 
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Figure  1 shows  the  facility  vector  for  the  achitectural  description  of  the  S-machine,  described  in 
Section  V. 
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Example  of  facility  vector  showing  dimension  of  each  machine  component 
Figure  1 


Following  Landin  {17]  and  McCarthy  [24]  the  LSS  semantics  will  be  defined  by  the  operation  of  an  abstract 
machine. 

M:  (S,  I,  G) 

where  S is  the  set  of  all  states 

I is  the  set  of  input  states 

is  a specified  set  of  Initial  states 

is  a specified  set  of  final  states 

p:S-^S  is  Che  transition  relation  mapping 
Is  a mapping  form  I to 

G is  a goal  function  map  from  X to  {0,1  },  l.e.  the  desired  action  of  M is  known. 

The  mapplngp  maps  states  into  states  nondetermlnistically  as  defined  by  1)  a macro  library  in  LSS  syntax 
and  il)  an  algorithm  S which  carries  out  the  control  steps  necessary  to  interprete  the  LSS  macros  so 
that  the  proper  operations  are  carried  out  on  the  states.  The  algorithm  fiwlll  be  discussed  later. 

The  set  of  states  S » is  the  catenation  of  the  LSS  facility  vector  and  s-control,  a 

control  tree  with  LSS  item  at  each  mode. 

The  mapping  £ is  defined  briefly  as  follows.  If  the  leaf  L of  the  control  tree  is  a macro,  then  L is 
replaced  by  the  expansion  of  the  macro  if  the  latter  contains  no  predicates.  If  the  macro  has  one  or  more 
nested  paths  of  predicates,  the  first  path  with  all  its  predicates  true  Is  chosen;  the  tree  structure  at  its 
end  replaces  L.  If  L is  an  assignment  statement,  then  either  local  or  global  variables  are  appropriately 
modified.  The  detailed  step  by  step  algorithmic  definition  follows. 

1)  If  s-control  is  the  null  tree  0,  stop. 

2)  Choose  any  leaf  L from  s-control,  and  let  L'  be  the  leaf  immediately  above  L. 

3)  If  L has  the  form  v:t,  where  v is  a term  and  t is  either  a macro  name  or  an  APL  expression,  then 

replace  L on  s-control  by  v,  and  add  a new  leaf  t to  s-control  below  L;  go  to  step  1). 

4)  If  L has  the  form  PASS:  t,  where  t is  as  in  step  3),  replace  L by  t;  go  to  step  1). 

5)  If  L is  a macro  name,  find  its  expansion  m in  the  macro-library,  and  perform  proper  passing  of  arguments, 

6)  If  m has  no  predicates,  replace  L by  m,  performing  proper  local  variable  binding;  go  to  Step  1). 

7)  If  m has  predicates,  choose  the  first  predicate  which  is  true,  and  replace  m by  the  structure  to  the 
right  of  this  predicate;  go  to  step  6). 

8)  If  L is  an  APL  expression  (in  particular,  a variable  name),  evaluate  It,  assign  it  to  L’ , and  remove 
both  L an  L'  from  s-control;  go  to  step  1). 

Note  that  when  s-control  ■ II  , 8 is  in  effect  the  identity  function.  As  an  example,  suppose  s-control  has 
one  leaf  fetch-word  (Ic),  where  Ic  indicates  instruction  counter.  In  the  expansion 

fetch-word (t)  * PASS:  mem(m;] 

m:2it[8  + ^24] 

t is  replaced  everywhere  by  the  value  of  ic,  and  the  binding  mechanism  insures  that  value  2i  lc[8  + i 24] 
of  the  local  variable  m will  in  fact  be  inserted  into  the  PASS  statement,  yielding  PASS:  mem[2ilc(8  +\24];]. 
Thus  the  effect  of  this  sequence  is  to  fetch  the  current  machine  instruction  from  memory. 


1 
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The  mapping  ai  Is  necessary  because  real  machines  physically  start  and  stop  (usually  by  actions  such  as  pushing 
buttons  on  the  console),  interior  circuits  are  reset  (again  by  a button)  so  that  specified,  known  initial 
states  (Si)  are  reached  and  instructions  as  defined  in  the  program  manuals  can  be  executed. 


In  validating  computer  design,  the  architectural  or  specification  level  machine  is  completely  defined 
by  formalizing  its  principles  of  operation  as  an  abstract  machine.  The  second  machine  specification  results 
in  a general  purpose  computer  controlled  by  the  instructions  in  the  control  store.  This  computer  is  made 
into  a special  purpose  machine  by  associating  the  actual  code  to  be  verified  as  a value  of  the  component 
which  contains  it  (e.g.  the  actual  microcode  in  the  control  store). 

After  machine  levels  have  been  formally  described,  in  order  to  validate  computer  designs  in  practical 
cases,  abstract  simulation  will  be  used,  so  the  method  of  S3rmbollc  Validation  of  Algorithmic  Equivalence 
will  be  described. 

III.  SYMBOLIC  VALIDATION  OF  ALGORITHMIC  EQUIVALENCE 

The  type  of  equivalence  we  wish  to  hold  in  order  to  prove  correctness  has  been  formalized  by  Miller 
as  algebraic  simulation  between  programs  [25].  As  related  to  our  abstract  machine  descriptions,  this  notion 
gives  a mechanism  for  describing  both  the  points  of  control  in  the  two  machines  at  which  a relationship 
between  them  should  hold  and  at  each  of  these  points  the  desired  relationship  between  the  components  of  the 
state  vectors,  (reducing  the  work  over  proving  state  equivalence).  A proof  that  such  a relation  is  a 
simulation  relation,  is  then  a proof  that  whenever,  and  however,  control  in  the  two  abstract  machines 
reaches  a pair  of  corresponding  points,  these  relations  will  hold. 

Assume  that  the  algorithm  S'  and  its  emulator  have  been  formally  described  in  LSS.  To  show  that 
pS*  is  a correct  implementation  of  S*  we  shall  employ  two  concepts.  The  first  of  these  is  an  abstract 
program. 

Definition  3.1:  Let  D be  the  union  of  mutually  disjoint  domains  Din,  Dcomp,  and  Dout.  An  abstract 

program  is  a pair  P » <D,F>  where  F is  a function  satisfying 

1)  F(DinuDcomp) c Dcomp u Dout 

2)  F(t)  » t whenever  t^  Dout. 

There  are  infinitely  many  ways  to  represent  S'  and  mS*  as  respective  abstract  programs  <D* , I*>  and 

<uD*,  ul'>  . For  example,  we  can  write  D*  ■ Din'u  Dcomp'u  Dout*,  where 


Din'  - { t 1 is-S* 

(t)A  s-control(t)  - starC(t)  } 

(1) 

Dcomp'  • { t 1 is-S ' 

(t)A  (s-control(t) 

j*  start  (t)A  a-control  D } 

(2) 

Dout*  = { t 1 Is-S' 

(t) As-control(t)  ■ D } 

(3) 

Next  let  tf  Din'U  Domp*  and  operate  the  interpreter;  run  the  algorithm  starting  at  step  1)  and  stopping 
when  step  1)  is  again  reached;  call  the  transformed  abstract  machine  state  I'(t).  We  put  I'(t)  ■ t when 
t<Dout*,  which  is  in  accordance  with  the  definition  of  t.  Programs  <|iD* , I, *>  are  defined  similarly  by 
replacing  S'  with  mS*  in  (1),  (2)  and  (3). 

Observe  that  D*,I'^  is  also  a realization  of  S'  as  an  abstract  program,  n ■ 1,2...,  and  in  this  sense 
<D',I*>  is  the  "finest"  (i.e.,  most  detailed)  representation  cf  S*  in  which  we  will  be  interested.  More 
generally,  the  integer  n may  become  a function  of  the  current  machine  state.  For  example,  if  S'  is  known 
to  contain  in  its  memory  a machine  language  program  which  always  terminates,  the  "coarsest"  (least  detailed) 
representation  of  S*  would  be<D',F*>  with  F*(t)  ■ I'^(t),  where  n is  the  smallest  Integer  for  which 
I*  (t)<Dout*.  We  shall  standardize  this  notation. 

Definition  3,2;  Let  E be  a set.  Then  by 
F’(t)  - wrt  E 

we  mean  that  F'  (t)  Is  defined  to  be  I'"(t),  where  n is  the  smallest  positive  Integer  for  which  I'"(t)f  E. 

(A  smlllar  notation  will  be  used  for  mF').  The  point  we  wish  to  stress  is  that  the  execution  function  in 
an  abstract  program  can  be  adapted  to  the  particular  problem  at  hand. 

The  second  notion  we  need  is  that  of  simulation  between  abstract  programs. 

Definition  3.3:  Let  P -<D,F>,P*  •<D',F*>  be  abstract  programs  and  R ■ RlnU  RcompU  Rout  be  a relation, 
with  RincDln  X Din*,  RcompCDcomp  X Dcomp*,  and  RoutcDout  X Dout*. 

1)  R is  a weak  simulation  of  P by  P*  if  ^ F (s)  ,F*  (t)>  € R whenever  <s,t>€  R. 

2)  R is  a strong  simulation  of  P by  P*  if  R is  a weak  simulation,  Rin  is  total,  and  Rcomp  and  Rout  are 
one-to-one. 

As  Milner  points  out,  we  have  the  equivalence 
<F(s),  F*(t)>#  RV<s,t>«  R<->  RF'C  FR. 


(1) 
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This  second  characterization  of  weak  simulation  states  that  R Is  a weak  homomorphism  between  P and  P'.  For 
additional  approaches  toward  simulation  which  exploit  the  categorical  structure  implicit  here,  consult 
Goguen  [14]  and  Burstall  [6].  Note  that  there  always  exists  a nontrivial  weak  simulation  between  any  two 
programs  which  never  terminate.  In  fact,  If  P “<D,F>and  P'  »<D’,F*>  satisfy  F(d)«  Dcomp^d*  D and 
F'  (d*)  «Dcomp' Vd*  < D',  respectively,  Chen  It  is  easy  to  see  R • D X D'  Is  a weak  simulation  of  P by  P*. 

We  can  give  an  Intuitive  explanation  of  a strong  simulation  R of  P by  P'.  Set  Q • RcompURout,  and  let 
s Din,  Then  3t«  Din'  with<s,t>R;  thus  < f”(s)  ,F' "(t)>  e Q,  whence  f"(s)  = Q“^F'^(t)  ,n-l  ,2  . . . , 
Consequently,  P'  can  compute  anything  computed  by  P. 

Our  definition  of  strong  simulation  differs  from  that  of  Milner  in  two  ways.  First,  we  need  Rcomp  to 
be  both  total  and  one-to-one,  whereas  Milner  Imposes  no  conditions  on  Rcomp.  In  our  case  It  could  happen 
that  the  memory  of  S*  contains  a nonterminating  program  of  machine  instructions,  so  that  D',F'  never  leaves 
Dcomp* , Nevertheless,  we  want  to  show  that  the  mlcromachlne  correctly  emulates  such  a program.  Thus  we 
need  Che  added  assumptions  In  the  computation  domain,  not  merely  In  the  output  domain.  Second,  we  relax 
Che  requirement  that  Rln  and  Rout  be  both  total  and  one-to-one,  because  our  weakened  hypotheses  are  suffi- 
cient to  establish  Milner's  Theorem  3,4  (see  the  parenthetical  statement  following  his  proof). 


In  order  to  simplify  the  proofs  of  simulation  by  breaking  them  into  parts  (see  [20]  we  represent  the 
simulation  relation  between  abstract  machines  in  a way  which  differs  from  the  formal  notation  of  Milner. 
Usually  the  points  of  control  at  which  we  are  Interested  In  establishing  a correspondence  (the  stopping 
points  at  which  the  state  vector  values  determine  the  intermediate  domains  which  define  F)  are  a small 
subset  of  all  possible  values  of  the  LSS  control  tree.  Since  parallel  operations  are  described  by  multiple 
leaves  on  control  trees  [1],  the  question  of  the  detenninacy  of  F arises.  Our  approach  thus  far  has  been 
to  show  that  they  can  be  transformed  into  linear  trees  (i.e.,  that  the  order  of  selection  of  operations  is 
immaterial).  See  [28,  30].)  Therefore  (see  [8,  21]),  we  decompose  the  simulation  relation  R into  components 
Rl,...,R  , one  for  each  pair  of  control  points  at  which  a relation  is  to  be  established.  Each  component 
contains  ^oth  control  Information,  specifying  for  each  machine  the  point  of  control  at  which  a correspondence 
must  hold,  and  simulation  conditions,  detailing  the  desired  correspondence.  The  control  information  usually 
consists  of  a certain  form  of  control  tree,  but  may  also  include  predicates  (stopping  conditions)  over  the 
state  vector  variables  which  further  constrain  the  points  of  simulation.  The  simulation  conditions  or 
simultaneous  verification  conditions  [21],  are  in  general  predicates  (usually  equalities)  relating  the 
state  vector  variables  of  the  two  machines.  A sample  simulation  component  appears  In  Figure  3;  Figure  4 
Illustrates  the  simulation  problem. 
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This  decomposition  of  the  simulation  relation  suggests  a decomposition  of  the  problem  of  proving 
simulation.  With  the  relation  partitioned  In  this  way,  we  must,  for  each  pair  of  stopping  points  corres- 
ponding to  a simulation  component:  (1)  assert  that  the  simulation  conditions  corresponding  to  that 

component  hold;  (2)  run  each  abstract  machine  until  another  stopping  point  Is  reached;  O)  verify  that 
the  pair  of  points  reached  corresponds  to  a component  of  the  simulation  relation;  and  (4)  prove  that  the 
simulation  conditions  of  this  component  hold.  Below  we  describe  the  theorems  which  show  that  this  procedure 
Is  valid. 
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XV.  THEOREMS  ON  WEAK  SIMULATION 

In  Chls  section  we  state  and  prove  two  theorems  useful  for  establishing  weak  simulation.  Applications 

will  appear  in  the  next  section.  Often  it  is  convenient  to  break  up  the  execution  function  F into  a 

composition  F » ^i^2**‘^n  examine  Che  machine  states  at  each  Intermediate  point.  The  situation  can  be 

depicted  by  the  diagram  in  Fig.  5 for  the  abstract  programs  P -<D,F>and  P*  -<D*,F’>,  where  D - 

and  D'  ■ D.'  - D As  an  example,  we  could  take  n » 2,  and  F,  and  F, ' could  execute  IFETCH  for  S’  and 
mS*,  respectively,  vdille  F^  and  ^ 


0- , 


Fig.  5.  Diagram  for  Theorem  4.1 

F2'  could  perform  lEXECUTE.  Our  first  theorem  determines  under  what  conditions  such  a diagram  yields  a 
weak  simulation. 

Theorem  4.1:  Let  RCDln  X Din'  UDcomp  X Dcomp'  UDout  X Dout*  be  a relation,  and  suppose  there  exists 

domains 


D - D„.D^. 


,D  , ,D  = D 
’ n-1*  n 


Vl’>°n’  ■ 


and  functions 


such  that 


V=Vi’"V' 


F - F^F^.-.F^ 

F’  - F- *F,*...F  *. 
12  n 


1)  If  there  exist  relations 

Rq  - ^ 

such  that  the  diagram  in  Fig.  3 semicononutes , then  R Is  a weak  simulation  of  P by  P'. 

2)  Conversely,  if  R is  a weak  simulation  of  P by  P',  then  there  exist  relations 

Rq  “ " 

such  that  the  diagram  in  Fig.  5 semicomnutes . 


1)  To  say  that  the  diagram  semicommutes  means  that  R^  ^ ^k^*^  * lf..«n.  Thus 

RF'  - R-F, *...F  * C F,R,F.'...F  ’ C F,F.R„F.’ . . .F  * C . . .C  F,F_. . .F  R - FR 
01  n 112  n 1223  n 12  nn 


and  we  conclude  by  (1)  that  R is  a weak  aimulatlon  of  P by  P*. 

2)  To  prove  the  second  assertion,  set  R^'  ■ R and  R^’  ■{  <F^(s)  ,Fj^*  (t)>  [o,t>  € k - 1, 

We  claim  that  R ’ C R»  to  see  this,  let  <s  ,t  >CR  then  9 <s  , ,t  ,>GR  such  that 

n n n n n-l  n-i  n-i 

<F^<s^_j^),F^*  <8^,t^>  . Continuing  In  this  way,  we  obtain  <Sj^  1 *St-l * 


<F|^(Sj^_l>,  "^*k’*^k^’  ^ “ l,...,n.  Thus  we  have 


"o’  * "* 


<t  , t >•  F ' 
n-l  n 


i 
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t 


(3) 


(4) 


But  (3)  says  3u  with  <Sg,u>€  F and  <u,t^  R,  thus  (4)  forces  u - s^;  hence  C R.  and  R^'CR,  as 

claimed. 

Next, 

\-l'V  Cl'kV*  R-l.....n 

by  (1)  aod  (2).  Thus  the  second  part  of  theoren  follcvs  if  we  take 

- R^*,  ^ " O.e.s.n  - 1, 

Theoren  4.1  treats  the  case  «rhen  F and  F*  can  be  written  as  function  compositions.  However,  F can  also 
be  viewed  as  a set  of  ordered  pairs,  and  such  a set  can  be  broken  into  mutually  disjoint  subsets.  This 
situation  is  illustrated  in  Fig.  6. 


Fig.  6.  Diagram  for  Theorem  4.2 

As  an  example,  we  mention  that  sometimes  machine  instructions  can  be  broken  into  two  classes:  the 

one-address  instructions,  «ihich  calculate  an  address  operand;  and  the  zero-address  instructions,  which 
manipulate  the  stack  and  do  not  compute  an  address.  We  can  take  n » 2 and  let  co^^cspond  to 

the  fetching  of  a zero-address  instruction  and  correspond  to  its  execution.  and  F^*  do  the 

fetching  and  the  address  calculation  for  a one-address  Instruction,  while  G2  and  perform  the  remainder 
of  the  instruction  execution.  Our  second  theorem  deals  with  such  a function  decomposition. 

Theorem  4.2:  Let  F and  F'  be  broken  into  decompositions  as  shown  in  Fig.  5,  and  let  RCDin  X Dln'U 
Dcottp  X Dcomp*  U Dout  X Dout*,Rjj^C  X Dj^*  be  relations,  j,k  • l,...,n.  If 

RF^’C  U F.R.v*  ^ 

R j-1  J J*' 

and 

*^jk^k'  ^ ^ ° 

Then  R is  a %»eak  simulation  of  P by  P’. 

Proof:  We  have 

F - U F.G,,  F'  - U F 'G 

J-1  J J J-1 

Therefore 

RF'  - R U F.'G.'  - U (RF.’)G  'CU  UF,R,.G.’ 

lt-1  » R k-1  R R k-lJ-1  J jR  R 

- U F U R..G  'C  y F G R - FR 
j-1  3 k-1  jR  R J-1  3 J 

and  the  proof  Is  conplete,  by  (1). 

To  give  substance  to  these  abstract  Ideas,  some  simple  machines  will  be  described  and  simulation  proofs 
sketched. 


r 
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V.  SIMPLE  EXAMPLES  OF  MICROPROGRAM  VALIDATION 


The  S machine  is  a simple  hypothetical  computer  described  by  Haralson  and  Polivka  (15]  Is  similar 
to  the  stack  machine  of  Gear  [11].  The  stack  is  in  main  memory,  which  is  an  array  SMEM  of  2 32-bit  words. 
Additional  components  in  the  architecture  of  S include  a one-bit  switch  SW,an  index  register  SX,  an 
instruction  counter  SCC,  and  a stack  pointer  SSTK,  each  of  size  32  bits.  The  LSS  facility  vector  for  the 
S machine  is  shown  in  Figure  1. 

There  are  27  machine  instructions,  which  are  divided  into  14  one-address  and  13  zero-address  instruc- 
tions. The  one-address  Instructions  need  an  address  operand;  the  zero-address  Instructions  manipulate 
the  stack  and  require  no  operand.  The  basic  LSS  macro  showing  the  operation  of  this  computer  on  the 
architectural  level  is  shown  in  Figure  2.  If  the  switch  SSW  is  set,  the  computer  is  running,  so  the  macro 
fetchwprd  (SCC)  gets  the  next  instruction  from  memory.  The  instruction  word  is  broken  into  parts  (by 
parallel  operations)  for  indirect  addressing  (id),  indexed  addressing  (ix),  op  code  (op)  and  address  (ad). 
These  values  are  substituted  into  the  instruction  fetch  part  (instrprep  and  advctr)  to  determine  the  correct 
address  and  to  advance  the  instruction  counter  and  are  also  substituted  into  the  instruction  execute  macro 
(execinstr).  This  basic  macro  calls  itself  recursively  so  the  computer  will  run  until  the  switch  SSW  is 
turned  off. 

uS  Machine  Data  Flow 


•M 


Fig.  7.  Data  flow  for  uS. 

It  has  entities  MEM,  SV,  X,  CC,  and  STK  corresponding  to  the  components  of  S.  It  possesses  in 
addition  general  purpose  registers  A and  B and  a memory  data  register  MDR,  each  of  size  32  bits;  a 24-blt 
memory  address  register  MAR;  a five-bit  instruction  register  IR:  and  a control  store  CONTROL  containing 
a maximum  of  300  sixteen-bit  microinstructions. 


(Rq) 


The  original  hand  proof  of  simulation  had  two  pairs  of  domains  or  stopping  points: 
(Isomorphic  to  and  Instruction)  end  machine  stopped  (Rg). 

The  microcode 


begin  instruction 


PC0DE2  which  was  proved  correct  (3)  has  172  microinstructions.  This  code  has  a 
major  loop  beginning  when  a machine  instruction  is  to  be  executed.  This  loop  is  divided  into  two  sections. 
In  the  first,  IFETCH,  the  operands  for  the  current  instruction  are  determined.  For  zero  address  instruc- 
tions the  given  operand  Is  used;  for  one  address  Instructions  Indexing  and/or  Indirect  addressing  is 
performed.  In  the  second  section,  lEXECUTE,  the  machine  instruction  operation  code  is  used  to  calculate 
a branch  to  a segment  of  microcode  which  performs  that  Instruction.  Each  of  these  segments  is  straight 
line  code  except  for  the  handling  of  the  two  shift  Instructions,  each  of  which  contains  a slnple  loop. 

In  spite  of  the  fact  that  uC0DE2  was  debugged,  and  simulated,  three  errors  were  found  and  corrected. 

A more  Interesting  data  flow  for  the  same  computer  architecture  is  shown  in  Figure  8 [20]. 


s 


1 

I 


Fig.  8.  Data  flow  for  uS*. 
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The  design  has  the  flavor,  but  not  the  complexity,  of  the  System/360,  Model  40.  In  addition  to  the 
Instruction  counter  (IC)  and  Index  register  (X)  available  to  a programmer,  the  microprograramer  can  In 
addition  access  two  general  purpose  registers  (A  and  B) , a special  testing  register  (D) , the  memory  address 
register  (MAR),  the  memory  data  register  (MDR) , a five-bit  instruction  register  (IR),  and  collections  of 
constants  (EMITS  1,2).  Note  that  the  stack  no  longer  resides  in  malu  memory.  The  hardware  can  examine 
the  contents  of  register  D and  detect  addition/subtraction  overflow,  stack  overflow,  and  algebraic  left- 
shift  overflow;  such  tests  and  error  checking  are  performed  by  the  five  staticizers  (STATS).  The  read/ 
write  operations  each  extend  over  three  microcycles.  The  S'  machine  is  horizontally  microprogrammed,  and 
the  44-blt  microinstruction  contains  10  fields  as  shown  In  Figure  9.  Bits  0 through  11  contain  the  address 
of  the  successor  Instruction;  however,  the  machine  computes  the  effective  address  by  performing  the  logical 
disjunction  of  the  stats  and  the  five  low-order  bits  of  nadr.  The  other  fields  Include  a testing  field, 
a memory  field,  an  arithmetic-logical  unit  function  field,  two  emit  fields  (which  provide  needed  numerical 
constants),  and  one  field  for  each  of  the  four  buses. 
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Fig.  9 Stats  and  microinstruction  format  for  yS’. 

Using  a self  explanatory  mnemonic  language,  ve  present  the  first  three  microinstructions  of  the 
microprogram  which  emulates  the  S machine  Instruction  set: 


micromemory 


location 

ALU  fen, test 

IN  Bus  1 

IN  Bus  2 

OUT  Bus  3 

OUT  Bus  4 

Memory 

NADR 

0 

ADD 

IC 

ONE 

IC 

MAR 

READIN 

1 

1 

ADD 

ZERO 

FOUR 

A 

2 

2 

NOT 

ZERO 

MASK 

B 

D 

READOUT 

5 

The  first  microinstruction  fetches  the  current  machine  instruction  and  updates  the  IC;  the  instruction 
becomes  available  In  D two  cycles  later.  The  second  and  third  prepare  for  Interpretation  of  the  fetched 
instruction.  The  microprogram  consists  of  two  parts:  IFETCH  (micromemory  locations  0-15),  which  fetches 

the  current  machine  instruction  and  handles  possible  Indirect  addressing;  and  lEXECUTE  (locations  16-90), 
which  executes  the  Instruction.  After  completion  of  lEXECUTE,  control  Is  always  returned  to  micromemory 
location  0.  There  is  no  temporal  overlap  between  IFETCH  and  lEXECUTE. 

To  show  that  any  machine  language  program  In  S Is  correctly  Implemented  by  uS',  we  must  exhibit  a 
strong  simulation  relation  R between  S and  pS*.  We  first  set 

W ■ {<u,  <w>>  £ S X uS ' 1 (mem(u)  • mem(w))  a (sw(u)  ■ sw(w)) 

A (stk(u)  “ stk(w))  A (lc(u)  - lc(w))  A (x(u)  = x(w)) 

A stack(u)  • stack(w)}, 

which  expresses  the  conditions  we  want  to  hold  whenever  we  examine  the  machine  states  at  Intermediate 
times.  That  Is,  S and  PS*  should  appear  the  same  from  the  machine  language  programmer's  point  of  view. 

We  must  now  verify  the  condition  RF*  <=  pR  for  each  machine  Instruction  1.  However,  by  Theorem  4,2, 
we  may  break  the  proof  into  several  pieces.  We  first  show  that  indirect  addressing  is  performed  properly 
by  the  relevant  microcode  In  IFETCH  for  an  arbitrary  machine  instruction  [20).  When  we  enter  lEXECUTE, 
we  may  encounter  stralght-llne  code  or  possible  loops. 

The  case  of  stralght-llne  code  is  easy  to  handle.  A sample  proof  for  the  ADD  machine  Instruction, 

which  adds  the  two  top  levels  of  the  stack  and  places  the  result  at  the  top  of  the  stack,  appears  in 

Figure  10.  The  proofs  of  the  conditions  to  be  verified  assume  that  condition  W held  at  the  beginning  of 

the  ADD  instruction  execution.  The  action  of  the  S machine  is  symbolically  executed  in  the  left  hand 
column.  The  first  Instruction  gets  the  word  in  the  architectural  stack  (stack(u) )whlch  is  addressed  by 
the  value  of  the  word  in  the  stack  pointer  (stk(u)  ) and  puts  It  in  temporary  (virtual)  storage  SO. 

Next  1 is  added  to  the  stack  pointer,  using  APL  operators,  then  the  next  word  in  the  stack  is  accessed 
and  stored  In  virtual  storage  S2.  Then  the  two  words  are  added  and  the  result  stored  In  the  stack.  The 

action  of  the  uS ' emulation,  shown  In  the  right  hand  column.  Is  guided  by  the  microprogram  uPl.  After 

this  emulation,  which  uses  the  US ' registers  A,B  and  STK,  the  verification  conditions  are  that  the  new 

values  at  the  top  of  the  stacks  are  equal.  This  proof  Is  straight  forward  for  a human. 


i 


sO:  stack(u) t2istk(u) ;1 


A : stack(K)(2istk(w);) 
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stk(u)^sl:  (32p2)tC2*32)11 
♦ 2istk(u) 

s2:  stack(u)l2istk(u);l 
S3:  (32p2)t(2*32)| 

(2isO)+2is2 
stack(u)  [2istk(u);] 
s4:  s3 

Verify:  (1)  si  = nl 
(2)  s4  = n3 
Case  I:  overflow 
sw(u)  ■*-  s5:  (2p2)t2 
S-end 

Verify:  (3)  s5  «= 
Case  II:  no  overflow 

S-end 

Verify:  empty 


stk(w)'nl:  (32p2)t (2*32) I (2i 

stk(w))  + 2i(31p0),1 
B : stack(w)12istk(v)  ;] 
stack(w) [2istk(w) ; ] 

<n3;  C32p2)t(2*32)| 
(2in0)+2in2 


sw(v)-n4:  (2p2)i2 
uS  *- end 
n4 

US' -end 


Figure  lo.  Conditions  to  be  verified  for  ADD  instruction. 


Let  us  change  the  microprogram  to  a more  realistic  emulation  to  see  how  the  problem  of  validation 
is  affected.  Recall  that  there  was  no  temporal  overlap  between  the  operation  of  IFETCH  and  lEXECUTE. 

We  now  Introduce  fetching  of  the  next  machine  Instruction  during  lEXECUTE,  with  the  goal  of  keeping  the 
main  storage  operating  at  all  possible  times  and  thus  achieving  maximum  performance  with  this  data  flow 
and  uP  design.  To  do  so,  let  us  remark  that  the  machine  instructions  fall  into  three  categories: 

TYPE  1.  Instructions  involving  no  branching  or  memory  reference,  e.g.  stack  manipulation  instruc-  < 

tions  such  as  ADD;  ’ 

TYPE  2.  Conditional  or  unconditional  branching  instructions  (which  of  course  do  not  reference 
memory) ; 

TYPE  3.  Instructions  which  reference  memory,  such  as  STORE  and  LOAD. 

We  now  revise  the  lEXECUTE  portion  of  our  microprogram  as  follows,  depending  on  the  type  of  machine 
instruction  being  emulated. 

TYPE  1.  The  first  statement  of  the  corresponding  lEXECUTE  section  starts  fetching  the  next  instruc- 
tion; that  is,  it  has  the  form  of  the  microinstruction  in  location  0.  The  current 

instruction  is  emulated  correctly,  then  control  passes  to  location  2 of  IFETCH. 

TYPE  2.  The  test  for  branching  is  determined,  and  the  ic  is  set  accordingly.  The  next  instruction 
is  fetched,  and  control  passes  to  location  1,  because  the  information  from  memory  will  not  be  available 
until  a cycle  later  (see  READOUT  in  location  2). 

TYPE  3.  No  changes  are  made.  Memory  is  referenced,  then  control  passes  to  location  0. 

It  is  clear  that  the  overlap  of  fetching  and  executing  Instructions  yields  a more  efficient  machine 
design.  If  we  replace  the  old  microprogram  by  the  revised  program  «p2»  we  obtain  a new  abstract  machine  yS'*. 

To  show  that  uS**  is  a correct  implementation  of  S,  we  could  proceed  as  before.  However,  since 
strong  simulation  is  transitive,  it  is  sufficient  to  exhibit  a strong  simulation  between  uS’  and  uS*'. 

To  do  so  we  divide  the  computation  domain  into  several  pieces,  as  illustrated  in  Figure  11.  The  domains 
uDf,  indicate  the  states  of  the  respective  machines  after  execution  of  the  microinstruction  In 
location  1.  The  domains  uD^  indicate  the  states  of  uS*  after  execution  of  instructions  in  lEXECUTE 
which  are  similar  to  those  in  location  1.  The  simulation  relation  R'  is  defined  by  register  identity 
at  the  points  of  correspondence  Indicated.  For  example,  points  in  pD2> 

must  satisfy  the  requirement  that  the  quantities  stack,  stk,  a (whose  contents  must  be  ^),  b,  d,  ic , 

X,  mem,  and  ir  are  equal.  However,  points  in  <uDq^j^ tuD^m^  would  merely  require  the  equality  of  stack 
and  stk  - the  only  quantities  which  could  be  affected  between  lEXECUTE  and  stopping  of  the  machine, 
e.g.  an  error  due  to  stack  overflow.  It  Is  then  a straightforward  matter  to  show  that  R’  Is  in  fact 
a strong  simulation  of  uS  by  yS". 
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Figure  ii.  Equivaler.ces  ber.^•een  two  ricrorachir.es  uV  and  uM'. 


In  carrying  out  these  formal  proofs  by  hand.  It  became  apparent  that  the  individual  parts  of  such 
proofs  were  not  of  great  complexity,  but  that  the  main  impediment  to  human  proofs  was  the  generation  and 
organization  of  these  many  separate  parts.  Since  this  number  of  parts  increases  with  the  size  of  the 
Implementation  being  verified,  it  became  clear  that  some  automated  aid  would  be  necessary.  In  the 
following  we  shall  describe  the  Microprogram  Certification  System,  (MCS) , an  interactive  system,  designed 
to  aid  in  proving  simulation  between  programs. 

VI.  THE  MCS  SYSTEM 

MCS  (for  "Microprogram  Certif Icatlon  System")  is  written  in  LISP  and  provides  interactive  aid  for 
proving  that  a stated  relation  between  two  machine  descriptions  is  a simulation.  A critical  problem  in 
the  hand  proofs  of  microprogram  correctness  (3,  19,  20J  was  to  assure  that  all  pairs  of  simultaneous  paths 
(from  one  component  of  the  simulation  relation  to  the  other)  were  taken  and  all  theorems  generated  and 
proved.  In  MCS  this  bookkeeping  function  is  embodied  in  an  interactive,  goal-directed,  and  problems 
Independent  supervisor  (^].  From  the  user's  point  of  view,  this  supervisor  provides  a uniform  interface 
through  which  he  manipulates  a tree  of  goals.  Using  a set  of  standard  commands,  he  controls  the  direction 
of  the  proof  of  simulation  and  observes  its  progress.  The  components  of  MCS  which  do  the  actual  work  of 
proving  simulation  - the  path  tracer,  simplifier,  verification  condition  generator,  theorem  prover,  etc.  - 
are  invoked  by  the  user  through  this  supervisor. 

The  data  structure  upon  which  the  MCS  supervisor  operates  Is  an  AND  goal  tree,  and  the  user  manipulates 
this  tree  in  a problem-reduction  fashion  [4,27].  Initially  the  tree  is  empty;  the  user  supplies  a first 
goal,  which  embodies  the  problem  which  he  wishes  to  solve,  and  which  becomes  the  current  goal,  or  focus 
of  attention.  Each  goal  consists  of  a flag  indicating  whether  or  not  the  goal  has  been  achieved,  a goal 
class  or  list  of  functions  for  attacking  the  goal,  and  a pattern  describing  what  the  problem  represented 
by  the  goal  is.  To  act  upon  the  current  goal,  the  user  Invokes  one  of  the  rules  in  the  goal  class.  This 
LISP  function  is  then  called  with  the  elements  of  the  pattern  as  arguments.  The  rule  may  achieve  the 
goal  directly  or  may  generate  one  or  more  subgoals  of  the  goal,  the  achievement  of  all  of  which  is  suffi- 
cient to  establish  the  original  goal.  If  subgoals  are  generated,  the  first  of  them  becomes  the  current 
goal.  If  a goal  is  achieved,  its  nearest  unproved  brother  becomes  current;  when  all  sons  of  a goal  are 
achieved,  that  goal  is  marked  as  achieved  and  the  search  for  an  unproved  goal  is  repeated.  The  original 
problem  is  solved  when  the  Initial  goal  is  achieved. 

In  addition  to  the  two  operations  of  adding  a new  goal  by  specifying  its  goal  class  and  pattern,  and 
Invoking  a function  upon  the  current  goal,  a new  current  goal  from  the  tree  may  be  selected  by  specifying 
its  index  number.  (Goals  are  Indexed  In  a Dewey-decimal  fashion.)  This  flexibility  allows  partial 
solutions  to  problems. 

The  initial  goal  entered  by  the  user  in  proving  simulation  between  programs  indicates  that  the  prob- 
lem is  to  prove  that  a given  simulation  relation  holds  between  two  given  machine  descriptions.  The  pattern 
of  such  a goal  is  a list  of  five  items;  The  state  vector  and  macro  library  of  each  machine,  and  the 
simulation  relation  (in  the  form  specified  above).  These  items  are  normally  read  from  a file.  The  class 
of  such  an  initial  goal,  called  SIMULATE,  contains  a single  rule,  RSPLIT,  which  generates  one  subgoal  for 
each  of  the  components  of  the  simulation  relation.  Each  of  these  subgoals,  of  class  TRACEIT,  has  in  its 
pattern  the  state  vector,  control  tree,  and  macro  library  for  each  machine,  a current  predicate  list 
of  assumptions,  and  the  complete  simulation  relation.  The  control  tree  of  each  machine  is  taken  from  the 
component  of  the  simulation  relation  to  which  the  subgoal  corresponds.  The  state  vector  is  formed  by 
prefixing  "$"  to  the  name  of  the  machine  component.  The  predicate  list  is  created  from  the  predicates 
given  in  the  stopping  conditions  and  simulation  conditions  of  the  simulation  component,  although  some  of 
these  predicates  may  not  appear  explicitly  In  the  predicate  list  but  may  be  reflected  In  the  initial  values 
of  the  state  vector  quantities.  Figure  12  shows  the  element  of  the  pattern  of  a goal  of  class  TRACEIT, 
generated  by  RSPLIT  from  the  simulation  component  of  Fig.  3. 
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To  achieve  each  of  the  goals  generated  by  RSPLIT,  the  user  must  show  that  starting  from  each  state,  and 
running  both  abstract  machines  to  their  next  stopping  points,  the  simulation  conditions  of  the  pair  of  points 
reached  hold.  The  proof  is  completed  by  showing  that  the  theorems  of  [21]  are  satisfied;  then,  since  the 
initial  conditions  are  true,  the  output  conditions  are  valid.  Since  there  is  a very  large  number  of  possl^ 
ble  execution  sequences  of  the  two  machines  (corresponding  to  the  number  of  possible  actual  values  of  the 
state  vector  quantities),  symbolic  execution  [4,  7,  16]  is  used.  Invocation  of  the  rules  TRACEM  or  TRACEMUM 
(for  the  algorithmic  level  and  the  implementation  level,  respectively)  Initiates  the  symbolic  interpreter 
on  the  control  tree  and  state  vector  of  the  goal. 
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Figure  12.  A Goal  of  Class  TRACEIT 


The  abstract,  or  symbolic,  interpreter  carries  out  the  symbolic  execution  of  each  machine  by  employing 
the  algorithm  in  Section  II.  A more  detailed  description  of  the  path  tracing  portion  of  MCS  is  given  in 
[22].  As  discussed  in  Section  II,  at  each  step  of  the  algorithm  either  the  control  tree  s-control  is 
modified,  or  either  local  or  global  variables  are  appropriately  modified.  At  each  step  of  6 the  simulation 
relation  is  checked  to  determine  whether  a stopping  point  has  been  reached;  when  it  is,  interpretation 
halts . 

This  procedure  is  complicated  by  the  fact  that  the  state  vector  elements  have  symbolic,  rather  than 
actual,  values.  This  chiefly  affects  two  aspects  of  the  interpretation.  First,  the  expressions  in  assign- 
ment statements  are  symbolic,  and  cannot  usually  be  evaluated  to  numeric  or  boolean  values.  Second,  when 
predicates  are  encountered  in  expanding  macros,  they  cannot  always  be  evaluated  to  "true"  or  "false". 

Both  of  these  problems  are  partially  solved  by  a simplifier,  which  performs  the  symbolic  computation 
done  in  MCS.  Whenever  the  Interpreter  encounters  an  assignment  statement,  a predicate  in  a conditional, 
or  the  passing  of  arguments  to  a macro,  the  simplifier  is  Invoked  to  "evaluate"  APL  and  logical  expressions 
by  returning  to  a simpler  form  if  possible.  In  addition,  the  simplifier  is  called  by  the  theorem  prover 
in  attempting  to  prove  generated  verification  conditions  and  to  determine  at  each  step  in  the  symbolic 
execution  whether  a stopping  point  has  been  reached.  This  ubiquity  of  points  at  which  simplification  is 

required  makes  this  the  component  of  MCS  in  which  the  most  time  is  spent. 

The  simplifier  in  symbolic  execution  must  provide  an  extended  semantics  for  each  APL  and  logical 

operator  encountered.  This  in  general  agrees  with  the  normal  semantics  when  the  operands  are  actual 

values;  e.g.,  the  result  of  evaluating  of  3 -h  2 remains  3.  The  extension  is  needed  because  operands 
may  now  be  symbolic  and  of  various  forms;  each  such  operator  and  form  of  operand  may  require  a different 
semantic  rule  describing  the  action  to  be  taken.  For  example,  the  result  of  exp  4-  0 is  exp,  and  the 
result  of  (-exp)  + exp  is  0,  whatever  the  form  of  exp.  Two  desirable  attributes  of  such  an  extended 
semantics  are  to  perform  as  much  of  the  actual  computation  as  possible,  and  to  eliminate  Irrelevant  parts 
of  expressions. 

In  the  MCS  system  the  simplifier  consists  of  a set  of  rules  for  each  operator,  invoked  by  pattern 
matching  in  a manner  similar  to  the  QA4  system  [29].  They  are  domain-oriented,  applying  to  a subset  of 
APL  and  logical  expressions,  and  are  designed  to  be  easily  modified  by  the  user.  Each  consists  of  a 
pattern,  containing  variables  to  be  matched  against  parts  of  an  expression  to  be  simplified  and  perhaps 
constrained  by  specified  predicates;  and  a body,  which,  when  Instantiated  by  the  variable  values  resulting 
from  the  pattern  match,  is  the  simplified  value  of  the  expression 
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Examples : 

h.  BlM-(pA)+lNj,  if  (pA)SM  and 

(pB)i  (M-(pA)+N) 

is  a rule  with  variables  A,  B,  M,  and  N,  pattern  (A,B)  [M+iN],  body  B[M-(pA)+2N] , and  two  constraints  on 
the  variables.  (These  constraints  must  be  passed  to  the  theorem  prover.) 


A+O-A  has  pattern  A+0,  body  A,  and  no  constraints  on  the  single  variable  A. 

When  an  expression  is  passed  to  the  simplifier,  each  argument  of  the  outermost  operator  is  first  simpli- 
fied. Then,  the  list  of  rules  for  that  operator  is  searched.  If  a rule  with  pattern  matching  the  expression 
is  found,  the  expression  is  replaced  by  the  (instantiated)  body  of  the  rule,  and  the  entire  process  is 
repeated.  When  no  applicable  rules  are  found,  the  expression  has  been  simplified.  (This  process  is  similar 
to  that  followed  in  the  SCRATCHPAD  system  of  Grlesmer,  Jenks,  and  Yun  [13].)  Thus  further  extensions  to 
the  semantics  of  an  operator  may  often  be  made  by  the  addition  of  rules  for  that  operator. 

In  MCS,  newly  created  expressions  are  simplified  at  once  to  prevent  the  propagation  of  unslmpllfied 
forms.  At  present  MCS  has  over  400  rules  for  the  simplification  of  APL  and  logical  expressions.  Because 
we  are  using  APL  operators  to  describe  manipulations  of  registers,  memory,  and  other  machine  components, 
a large  number  of  the  simplification  rules  in  our  system  relate  to  the  APL  indexing  and  concatenation 
operators  over  vectors  and  matrices.  Assignments  made  to  particular  locations  in  memory  are  effected  by 
forming  a new  APL  expression  for  the  memory  by  concatenation.  To  assure  the  correctness  of  the  simplifi- 
cation rules  for  these  arrays  (which  are  not  always  obvious),  many  of  the  identities  which  they  express 
have  been  proved  to  hold,  using  More’s  axiomitlzatlon  of  array  theory  [26J  based  on  an  APL-like  language. 

When  the  interpreter  expands  a macro  with  predicates,  it  must  attempt  to  prove  each  predicate  true 
or  false  (using  the  current  state  vector  values  and  predicate  list)  before  it  can  proceed.  But  this  is 
in  general  Impossible,  since  the  predicates  contain  symbolic  values.  If  a predicate  can  be  shown  true, 
the  Interpreter  continues  with  the  expansion.  Otherwise,  since  all  possible  paths  must  be  followed,  the 
interpreter  stops  at  this  point  and  generates  one  subgoal  of  type  TRACEIT  for  each  predicate  that  cannot 
be  shown  contradictory.  In  such  a subgoal,  the  predicate  (after  Instantiation  by  the  state  vector)  is 
added  to  the  predicate  list.  Often,  however,  predicates  can  be  reflected  in  the  state  vector  values  and 
removed  from  the  predicate  list  entirely.  For  example,  if  the  branch  corresponding  to  predicate  $X*0 
is  to  be  taken,  symbolic  value  $X  can  be  replaced  everywhere  (including  possible  occurrences  in  the  state 
vector  of  the  machine  not  being  executed)  by  zero. 

Assertions  about  portions  of  registers  such  as  $Y(\2]*1  1,  can  also  be  reflected  in  this  wav,  here 
by  replacing  $Y  by  1,1,$Y[2+  30],  if  $Y  is  a 32-blt  vector.  Care  must  be  taken  in  making  such  substitutions 
to  avoid  loss  of  information.  In  case  of  an  equation  of  the  form  SX*f($X),  substitution  for  SX  can  be 
made  if  and  only  if  f ($X)*f (f ($X))  can  be  shown  [22].  When  several  simultaneous  assertions  are  made,  such 
as  when  assuming  the  stopping  and  simulation  conditions  before  path  tracing,  the  process  is  slightly  more 
difficult.  Predicates  other  than  equalities  must  be  retained  on  the  predicate  list.  Reflecting  of  pre- 
dicates in  the  state  vector  helps  path  tracing  by  making  future  predicates  easier,  or  even  trivial,  to 
prove,  and  by  considerably  simplifying  the  expression  for  the  state  vector  quantities. 

Normally,  the  abstract  interpreter  is  applied  first  to  the  specification  description  (TRACEM)  until 
a stopping  point  is  reached,  and  then  to  the  processor  level  (TRACEMIIM) . In  each  case,  the  interpreter 
constantly  checks  to  see  if  one  of  the  stopping  points  defined  by  the  components  R, ,...,R  of  T has  been 
reached;  if  so,  a subgoal  of  type  TRACEIT  is  generated.  Note  that  all  loops  or  potential?y  infinite 
expansions  in  the  machine  description  must  have  associated  stopping  points  in  some  component  R,  . In  the 
event  that  such  a stopping  point  were  omitted,  the  interpreter  would  repeatedly  generate  new  subgoals  at 
a decision  block  in  the  loop,  or  it  would  perform  macro  expansions  and  contractions  endlessly. 

When  a goal  is  generated  for  which  both  descriptions  have  reached  stopping  points,  the 
verification  condition  generator  is  Invoked  through  the  rule  GENVC,  still  a member  of  class  TRACEIT.  This 
verification  condition  generator  has  two  main  functions.  The  first  is  to  verify  that  if  R^  and  R,  are 
the  simulation  components  reached  by  tracing  the  paths  of  the  two  abstract  machines  to  stopping  points, 
then  i • j;  l.e.,  another  point  of  correspondence  in  the  simulation  relation  has  been  reached.  The  second 
is  to  instantiate  values  from  the  two  state  vectors  into  the  list  of  simulation  conditions  (simul taneous 
verification  conditions)  given  In  the  specified  component  R^,  GENVC  generates  a goal  of  class  THMPRV  for 
each  of  these  instantiated  conditions,  having  as  its  pattern  this  theorem  and  the  predicate  list  of  the 
current  goal.  Often,  because  of  the  continual  simplification  and  incorporation  of  equalities  into  the 
state  vector  during  interpretation,  these  theorems  are  equalities  between  identical  expressions.  An 
example  of  a generated  theorem  appears  in  Figure  13. 

((8oO),((32o2)T(2i$SX)+2i$SME:M[2i$SCC(8+i24];8+t24l)[8+\24])[i8]  ■ 8p0 

Figure  13. 

Sample  theorem  generated  by  GENVC  and  proved  by  the  theorem  prover. 

Generally,  the  rule  PROVEIT  is  used  on  goals  in  THMPRV.  This  Invokes  the  theorem  prover  and  simplifier 
on  the  theorem  of  the  pattern  (see  below).  The  theorem  prover  is  embodied  in  simplification  rules  for 
logical  expressions,  involving  the  relational  operators  « and  s and  the  logical  connectives  a , v,  and  - 
This  theorem  proving  may  be  viewed  as. an  extension  of  simpllf Icatlon.  If  a theorem  simplifies  to  "true", 
then  the  goal  is  achieved  (and  theorems  are  the  only  goals  generated  In  proving  simulation  that  can  be 
achieved  directly,  without  generation  of  subgoals.)  If  the  theorem  cannot  be  proved  by  the  theorem  prover, 
its  simplified  form  is  the  theorem  of  the  pattern  in  a single  generated  subgoal.  Two  other  rules  in  class 
THMPRV  may  then  be  invoked.  ANDSPLIT  is  applicable  if  the  theorem  to  be  proved  is  conjunction  - it 
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generates  a separate  subgoal  for  each  conjunct,  also  in  class  THMPRV.  CASESPLIT  can  be  used  to  split  the 
given  theorem  Into  two  cases  on  the  basis  of  a predicate  supplied  when  invoking  the  rule.  The  given 
predicate  is  appended  to  the  predicate  list  of  one  of  these  subgoals  and  its  negation  to  the  other.  The 
user  may  then  proceed  to  split  into  further  cases  or  to  attempt  the  new  subgoal  directly  using  PROVEIT. 
Then,  since  each  subgoal  contains  more  Information,  the  theorem  prover  may  be  able  to  establish  a theorem 
which  it  was  not  powerful  enough  to  establish  before. 


Though  the  necessity  for  theorem  proving  in  MCS  is  most  visibly  apparent  in  the  need  to  prove  the 
verification,  it  is  also  needed  at  various  other  points,  such  as  in  deciding  predicates  at  branch  points 
and  in  determining  whether  a stopping  point  has  been  reached. 


In  the  case  of  equalities  between  APL  expressions,  MCS  relies  heavily  on  the  fact  that  expressions 
which  can  be  proved  equal  have  the  same  simplified  form.  The  present  set  of  rules  certainly  is  not  complete 
in  the  sense  that  such  a canonical  form  is  assured  in  all  cases.  However,  the  incorporation  of  equalities 
into  the  state  vector  does  assure  that  both  sides  of  an  equality  are  expressions  over  the  same  set  of 
symbolic  constants  and  often  eliminates  the  need  for  references  to  the  predicate  list  of  assertions  to 
resolve  equalities. 


If  all  the  theorems  on  all  branches  are  proved  true,  the  supervisor  will  mark  the  top  level  goal  as 
achieved,  and  simulation  will  have  been  shown.  The  goal  tree  Itself  is  a record  of  the  proof  that  when- 
ever the  stopping  points  are  reached,  the  simulation  conditions  are  satisfied.  Errors  in  the  microcode 
(or  in  the  formal  descriptions)  are  detected  by  being  unable  to  prove  theorems  at  the  leaves  of  the  goal 
tree;  by  tracing  back  toward  the  root,  information  about  the  particular  instruction  or  place  in  the  des- 
cription at  which  the  error  occurred  can  be  obtained. 
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Figure  14. 

VII.  THE  AUTOMATED  S-MACHlNE  EXPERIMENT  Portion  of  Goal  Tree  for  S-Machine  Experiment 

This  simple  machine  was  described  in  Section  V,  and  the  formal  band  proof  of  the  validity  of  the 
microcode  pCODE  2 sketched.  The  formal  descriptions  of  both  machines  S and  pS  the  microcode  yC0DE2,  and 
the  simulation  relation  R were  entered  in  the  S/370  model  168  and  analyzed  by  a preprocessor.  This  program 
forms  state  vectors  SSV  and  mSSV,  which  contain  the  elements  of  the  respective  facility  vectors  of  S and 
uS  along  with  the  initial  symbolic  values.  The  permanent  value  of  CS  Is  uC0DE2,  and  the  initial  value 
of  each  remaining  component  is  its  name  prefixed  by  a dollar  sign. 

As  pointed  out  above,  natural  places  to  choose  for  stopping  points  depend  upon  the  architectural 
description  (in  this  case  the  S machine),  and  upon  the  structure  of  the  microcode.  In  the  original  S 
description  [3],  each  machine  Instruction  was  described  separately;  in  the  microprogram,  portions  of  the 
code  were  shared  by  several  instructions.  To  make  the  two  descriptions  more  similar  new  macros  were  added 
to  the  S description  to  consolidate  the  work  previously  done  In  each  instruction.  These  changes  produced 
a second  architectural  description  which  was  easily  shown  by  hand  to  simulate  the  initial  one.  The  proof 
was  completed  by  showing  a simulation  relation  between  the  new  description  and  the  register-transfer  des- 
cription yS  and  using  the  transitivity  of  simulation. 

The  original  hand  proof  of  simulation  had  two  pairs  of  domains  or  stopping  points:  begin  Instruction 

(Rq)  (isomorphic  to  end  instruction)  and  machine  stopped  (Rg^*  change  In  description  allowed  the 

addition  of  the  new  domains  (zero  address  instruction  calculat Ion) , R2  (one  address  calculation),  R- 
(left  shift  simultaneous  verification  condition),  and  R (right  shift  simultaneous  verification  condition). 
The  number  of  paths  to  be  traced  was  reduced  by  this  addition,  and  shorter  paths  and  easier  theorems 
resulted. 

All  of  the  pairs  of  paths  emanating  from  the  six  components  of  the  simulation  relation  were  traced, 
and  the  following  results  obtained. 


Corresponding  pairs  of  simulation  paths  traced:  45 

Verification  conditions  generated:  236 

Verification  theorems  proved 

Without  interaction:  222 

With  interaction  (splitting  into  two  cases):  2 

Verification  conditions  not  proved  because  uP  Errors:  12 
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The  system  discovered  the  three  errors  that  had  been  found  previously  by  hand.  However » an  error 
in  address  generation  was  found.  Direct  addressing,  indirect  addressing  and  indexed  indirect  addressing 
stored  an  address  as  eight  zeros  followed  by  24  address  bits,  while  indexed  addressing  stored  an  address 
as  eight  random  bits  followed  by  the  24  address  bits.  This  is  a new  error  not  discovered  in  the  hand 
proof.  The  24  address  bits  are  correct,  but  the  bits  from  0*7  in  the  word  are  not  masked  to  zero  and 
depend  upon  the  contents  of  the  index  register. 

The  corresponding  corrections  were  made  in  the  yP.  Then  MCS  traced  45  paths,  generated  236  verifica- 
tion conditions  and  proved  236  theorems,  with  human  aid  to  prove  two.  Thus  the  new  yP  was  validated. 


VIII.  THE  HTC  EXPERIMENT 

MCS  is  now  being  applied  to  a computer  architecture  which  is  more  complicated  In  many  ways  than  the  i 

S-machlne.  The  HTC  (for  Hybrid  Technology  Computer)  is  a product  of  the  IBM  Federal  Systems  Division  at  | 

Huntsville  for  space  flight  applications  [32].  Its  architectural  specifications  require  it  to  support  the  | 

System/360  standard  instructions  (no  decimal  or  floating  point  operations)  with  sixteen  32-bit  general 

purpose  registers,  16  to  64  K bytes  of  main  storage,  and  a single  I/O  channel  for  three  types  of  I/O.  This  ’ 

architecture  is  implemented  in  a machine  having  a IK  memory  of  64-bit  microinstructions,  a 64x16  bit  read-  j 

only  memory  for  instuction  decoding,  an  ALU  and  two  input  multiplexers,  three  working  registers,  and  a J 

16-blt  wide  data  path.  | 

Part  of  the  increased  difficulty  with  the  HTC  lies  in  formalizir.g  the  description  of  the  architecture  j 

and  register  transfer  levels.  Architecturally,  the  HTC  differs  from  the  System/360  in  several  ways.  In  < 

addition  to  emulating  the  specified  instruction  set,  the  HTC  microcode  also  implements  interrupt  and  I/O  | 

handling.  Thus  the  formal  architectural  description  must  be  pieced  together  from  the  System/360  documen-  ! 

tation  and  other  HTC  information.  Also,  in  the  machine  Instruction  definitions,  parts  of  the  specifications 
(such  as  condition  code  settings)  were  left  Intentionally  ambiguous  to  allow  implementation  on  various  j 

S/360  models;  the  architectural  description  must  allow  for  this  vagueness.  Our  present  approach  has  been  i 

to  use  the  APL  ? operator  in  these  cases,  and  to  include  specific  simplification  rules  for  handling  expres-  ) 

sions  with  this  operator.  For  example,  if  the  two  bit  condition  code  is  specified  to  be  unpredictable  in  i 

an  instruction,  it  will  have  a value  of  ?2p2  at  the  end  of  that  path.  The  microcode  will  give  the  condition  ^ 

code  some  specific  value,  and  the  theorem  prover  must  be  able  to  answer  "true"  for  the  equality  of  the  two.  ^ 

The  single  16-bit  wide  I/O  channel  of  HTC  has  been  modeled  by  an  array  of  16  columns  and  an  infinite  j 

number  of  rows  (see  [26]).  Input  from  the  channel  is  done  by  taking  the  first  row  of  the  array  as  the  | 

input  word;  output  is  done  by  concatenating  rows  onto  the  end  of  the  array.  Thus  the  words  sent  to  the  ] 

channel  in  each  machine  can  be  compared  when  a stopping  point  is  reached;  their  equality  is  necessary  for  j 

the  proof  of  simulation.  i 

In  the  HTC,  the  register-transfer  level  has  a 16-bit  data  path  with  16-blt  register  and  storage  1 

locations;  the  architectural  level  is  8-bit  byte-oriented.  For  example,  each  32-blt  general  purpose 

System/360  register  is  stored  in  two  non-contiguous  locations  in  the  HTC  scratch  pad  memory,  and  the  : 

16Kx8  bit  main  store  of  the  architectural  description  is  Implemented  in  an  8Kxl6  bit  memory  in  the  hard-  j 

ware.  Thus  the  APL  expressions  stating  the  equalities  between  the  machine  components  are  more  complex 
than  in  the  S-machine,  where  the  architectural  components  are  a subset  of  those  at  the  register  transfer 
level. 

The  HTC  register  transfer  level  description  describes  how  the  microcotnxnands  in  a microinstruction  act 
to  modify  the  values  in  the  registers  and  storage  in  a real  coeiputer.  Since  the  validity  of  the  simulation 
proof  depends  upon  the  accuracy  of  this  description,  only  hardware  entitles  are  made  part  of  the  micro 
machine  state  vector.  However,  macros  should  be  written  to  be  direct,  and  perform  calculations  in  a 
straightforward  manner  so  that  simulation  speed  is  improved.  These  requirements  conflict  frequently.  In 

the  HTC,  the  ALU  is  used  during  each  microcycle,  with  inputs  selected  according  to  a complicated  ccr.blnation  i 

of  micro  commands,  and  the  output  is  similarly  distributed.  For  clarity  these  Inputs  and  the  outputs  were  i 

represented  as  quantities  in  the  state  vector  rather  than  in  one  large  macro.  Even  so,  processing  was  slow 
Examination  of  the  machine  microcode  showed  that  frequently  the  ALU  simply  assembled  and  passed  data  with- 
out performing  arithemtic  or  logical  operations.  By  testing  the  few  fields  Indicating  such  constructions  ' 

and  performing  them  directly  the  speed  of  operation  of  the  micro  machine  was  considerably  improved.  While  i 

this  type  of  analysis  can  be  done  by  programs,  it  indicates  that  for  efficient  microcode  validation  a good 
knowledge  of  both  the  microcode  and  the  data  flow  is  presently  required. 

I 

The  HTC  is  a real  machine,  so  the  methods  by  which  it  physically  starts  (stops)  and  the  consequences 
of  its  timing  conditions  and  asynchronous  actions  must  be  modeled.  Starting  and  stopping  concerns  both  i 

the  architectural  and  register-transfer  levels  while  the  timing  conditions  primarily  affect  the  register  | 

transfers.  Since  the  HTC  is  an  airborne  computer  In  the  Space  Ultra-reliable  Modular  Computer  (SITMC)  j 

family,  its  connections  with  the  outside  world  are  controlled  by  microprogrammed  Test  Support  Equipment  | 

(TSE)  which  simulates  an  HTC  channel  and  also  has  lines  directly  connected  to  the  HTC.  The  HTC  channel 
and  accompanying  Interrupts  were  mentioned  earlier.  The  control  lines  (activated  by  buttons  on  the  TSE 

console)  and  their  functions  are  shown  In  the  following  table:  | 


Architecture 

Register  Transfer 

Function 

Power  On/Off 

Clock  Start/Stop 

Activate  the  basic  clock  cycle. 

Reset 

Hardware  Reset 

Set  some  registers  and  flip  flops  to 
known  states.  Then,  by  microprogram, 
set  the  remaining  basic  registers  so 
the  machine  architectural  and  support 
functions  may  he  initiated. 

Soft-Scop 


STOP 
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Stop  performing  architectural  functions 

and  return  to  the  reset  condition  ready  to 
receive  external  commands 

The  basic  architectural  functions  which  may  be  initiated  are  IPL  and  Read  Paper  Tape  (for  program  entry). 
The  support  functions  are  the  usual  maintenance  functions  - register  display,  stop  and  display  registers 
when  a particular  address  (in  main  storage  or  in  control  store)  is  reached,  clear  and  test  main  store,  and 
display  main  storage  or  control  storage  locations.  All  of  these  functions  are  initiated  by  the  TSE  and 
performed  under  microprogram  control  using  the  HTC  channel.  Since  these  functions  are  implemented  in  micro- 
code, an  architectural  level  description  for  them  must  be  given  and  a simulation  between  the  two  levels 
specified  and  proved. 

The  basic  HTC  machine  cycle  is  550  ns  and  the  main  storage  cycle  is  700  ns.  One  microinstruction  is 
performed  per  machine  cycle.  The  action  of  the  registers  is  asynchronous,  but  the  microinstruction  actions 
are  performed  roughly  sequentially  except  for  interactions  between  the  main  store  and  the  computer.  The 
action  of  the  registers  is  determined  by  sequential  leaves  on  the  ;-HTC  control  tree,  as  was  shown  in  the 
S-machine  example.  The  timing  interaction  between  the  computer  and  main  storage  is  considered  only  to 
ensure  that  the  contents  of  affected  registers,  the  Storage  Data  Register  and  the  Instruction  Register, 
are  valid.  All  timing  is  thus  relative  to  the  microinstruction  execution,  so  a pseudo  counter,  CTR  was 
introduced.  Whenever  a microinstruction  is  read,  CTR  is  incremented.  Each  of  the  two  registers  affected 
by  timing  is  replaced  by  two  pseudo  registers.  The  first  contains  the  usual  register  contents  and  this 
register  is  set  as  if  there  are  no  timing  restrictions.  The  second  pseudo  register  contains  the  contents 
of  CTR  at  the  time  the  first  is  set.  When  one  of  these  registers  is  accessed,  the  value  of  the  second 
register  is  compared  with  the  contents  of  the  CTR,  and  if  the  difference  is  too  small  a timing  error  is 
signaled.  The  Instruction  Read  Only  Memory  (IROM)  initiates  the  only  asychronous  action  which  may  cause 
difficulties.  When  a word  from  main  storage  is  put  in  the  left  half  of  the  instruction  register  and  the 
register  is  valid,  the  eight  bits  corresponding  to  the  op  code  go  immediately  to  the  IROM  as  address. 

This  register  too  is  represented  by  two  pseudo  registers  but  the  second  register  is  updated  bv  CTR+1 
instead  of  CTR. 

Finally  the  size  of  HTC  as  compared  with  the  S-machine  makes  necessary  a more  complex  simulation 
relation,  more  paths  to  be  interpreted,  and  more  difficult  theorems  to  be  proved.  The  HTC  implements  an 
instruction  set  three  times  larger  than  that  of  the  S-machine,  and  has  eight  times  as  many  microinstructions, 
each  of  which  is  four  times  larger. 

At  present  the  HTC  architectural  and  register  transfer  levels  have  been  formally  described,  the  simula- 
tion relation  has  been  partially  formulated,  portions  of  the  microcode  have  been  proved  correct,  and  some 
errors  have  been  detected.  Several  of  these  have  been  subtle  errors  which  are  difficult  to  detect  using 
test  cases.  For  example,  one  of  them  would  occur  only  when  fetching  a 16-bit  instruction  from  the  last 
halfword  of  addressable  memory.  Another  more  serious  flaw  was  found  in  the  implementation  of  the  BALR  or 
branch-and-link  instruction:  in  cases  where  the  link  information  was  to  be  stored  in  the  same  register 

containing  the  branch  address,  the  branch  address  was  erroneously  lost.  All  of  these  errors  were  not  found 
by  testing  or  simulation:  They  were  detected  because  of  unprovable  theorems  being  generated  in  the  certi- 
fication process.  The  goal  structure  of  MCS  enabled  the  user  to  go  directly  to  a small  segment  of  micro- 
code and  correct  the  error. 

IX.  CONCLUSIONS 


Our  aim  in  the  SVAE  system  has  been  not  only  to  obtain  proofs  of  correctness,  but  also  to  detect  and 
correct  errors.  The  structuring  of  the  proof  into  goals  imposed  by  the  supervisor  aids  in  this  detection. 
Indications  of  error,  such  as  being  unable  to  prove  a theorem,  falling  to  reach  an  expected  stopping  domain 
and  unexpected  branches,  occur  when  the  user  invokes  a rule  in  a particular  subgoal.  Examination  of  the 
branches  of  the  goal  tree  leading  to  the  error  indication  provides  some  information  as  to  the  cause  of  the 
error . 

The  problem  reduction  approach  provided  by  the  MCS  supervisor  has  several  advantages  over  both  hand- 
proof  techniques  and  more  straightforward  automated  methods.  Most  important  of  these  is  that  no  path  to 
be  traced  can  be  over-looked;  the  supervisor  always  returns  to  the  unproved  goals.  Also,  the  goal  tree 
provides  a record  of  the  way  in  which  the  proof  was  developed.  Finally,  the  capability  to  decide  the 
order  in  which  the  goals  are  to  be  manipulated  permits  partial  proofs;  certain  sections  of  program  or 
microcode  can  be  verified  even  before  correctness  criteria,  in  the  form  of  simulation  relations,  have  been 
developed  for  other  parts. 

The  automated  parts  of  our  system  were  found  most  useful  for  coordinating  parts  of  the  proof,  simplify- 
ing symbolic  APL  expressions,  and  running  the  symbolic  interpreter.  The  immediate  simplification  of 
expressions  and  processing  of  predicates  encountered  contributed  greatly  to  the  simplicitv  of  the  final 
theorems  generated;  this  was  especially  helpful  in  view  of  the  interactive  nature  of  the  system.  Also  found 
valuable  were  the  ability  of  the  user  to  control  the  proof  direction  through  the  supervisor  and  the  rules 
for  processing  them  while  the  proof  is  in  progress. 

Of  course,  the  human  contribution  to  the  SVAK  automated  validation  of  Implementations  consists  of  more 
than  interaction  with  the  MCS  system.  Describing  the  abstract  machines  which  embody  the  system  specifications 
Is  by  no  means  trivial,  expeclally  when  the  English  ’’principles  of  viperation"  arc  vague.  Also  specification 
of  the  simulation  relation  requires  some  understanding  of  how  the  program  being  verified  works  (location 
of  loops,  etc.),  though  use  of  MCS  to  interpret  symbolicallv  a single  program  m.iv  provide  .jid  in  developing 
simulation  conditions  and  stopping  points.  The  judicious  choice  of  stopping  points  can  greatlv  reduce  the 
number  of  the  paths  which  must  be  followed  and  the  theor«'ms  which  must  be  proved. 

The  successful  detection  and  correction  of  e: rors  in  a small  microprogram  using  SVAE  and  our  progress 
toward  the  verification  of  an  actual  microcoded  Implementation,  have  confirmed  our  beliefs  that  computer 
aid  in  validating  the  design  of  computer  systems  is  needed  and  valuable,  and  that  tlie  notion  of  simulation 
between  programs  facilitates  tills  automation. 
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Summary 

A definition  of  the  system  integrity  function  is  given  in  order  to  enable  comparative  evaluation  of 
different  design  approaches  for  digital  flight  control  systems  with  respect  to  the  system  integrity  and 
mission  survival.  Two  main  failure  detection  methods  are  analytically  investigated  with  respect  to  the 
integrity  function,  the  passive  failure  detection  and  the  selfdiagnosing  active  failure  detection.  The 
HFB  320  experimental  system  is  briefly  described  as  an  example  of  the  selfdiagnosing  active  failure 
detection  method. 


1.  Introduction 

During  the  last  decade  a number  of  fly-by-wire  experimental  systems,  both  analog  and  digital,  have  been 
developed  and  flight-tested  in  several  countries  (e.g.  ref.  1,  2,  3),  most  with  the  intention  of 
demonstrating  the  feasibility  of  flight  control  by  electronic  means  without  mechanical  back-up.  Of  course, 
other  aspects  such  as  new  control  law  concepts  were  also  explored,  but  the  main  concern  has  been  to 
ensure  the  required  levels  of  integrity  for  the  hardware  and  software  of  the  control  system.  Most  of  these 
systems  differ  considerably  in  the  design  philosophy  used  to  ensure  the  required  levels  of  integrity.  But 
so  far  not  enough  theoretical  work  has  been  done  to  allow  sufficiently  precise  quantitative  comparison  of 
system  integrity  levels  between  different  design  philosopies.  In  particular,  this  is  true  in  the  case  of 
digital  systems. 

This  article  attempts  to  contribute  to  the  development  of  easily  adaptable  methods  for  analytically 
comparing  designs.  Although  more  general  investigations  were  completed  only  those  for  the  electronic  part 
of  the  system  hardware  and  its  software  are  described  here.  In  that  context,  the  criterion  for  the  mission 
abort,  which  is  derivable  from  the  loss  of  integrity  is  a reasonable  quantity  to  look  at.  This  criterion 
might  be  the  maximum  number  of  detected  failures  which  are  permitted,  but  it  might  also  be  in  more  general 
terms  the  maximum  permitted  probability  of  total  loss  within  the  mission  or  operation  time  of  the  system. 
Here  the  later  is  used  by  applying  the  integrity  function.  This  function  represents  the  actual  status 
with  respect  to  the  integrity.  It  contains  the  basic  information  from  which  the  extrapolation  of  the 
probability  of  total  system  loss  during  the  remaining  mission  time  can  be  derived. 

There  are  two  reasonably  cost  effective  means  to  ensure  system  integrity  for  flight  control  systems,  first 
redundancy,  and  second  the  addition  of  monitoring  and  switching  mechanisms  in  order  to  make  use  of  the 
available  redundancy  in  case  of  a malfunction.  The  potential  of  the  redundancy  concept  is  well  established, 
whereas  the  possible  potentials  of  the  malfunction  detection  mechanism  have  not  been  explored  thoroughly, 
particulary  regarding  its  effect  on  the  integrity  or,  for  the  mission  abort  criterion,  on  the  probability 
of  total  system  loss.  Therefore  some  quantitative  relationships  will  be  outlined  in  the  following  showing 
the  effect  of  the  quality  of  the  malfunction  detection  mechanism  on  the  integrity  function.  Additionally, 
a brief  desciption  will  be  given  of  the  HFB  320  experimental  digital  fly-by-wire  system.  For  this  system 
an  unusual  approach  to  malfunction  detection  called  selfdiagnosing  active  failure  detection  (SAFD)  has 
been  mechanized. 
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2.  Principles  of  malfunction  detection 

The  fundamental  detection  principle  which  must  be  applied  in  all  cases  is  the  comparison  of  output  signals 
from  functionally  equivalent  processing  units.  These  output  signals  are  independently  derived  from  the 
same  input  signal  and  are  dependent  on  the  status  of  the  process.  Discrepancies  at  the  comparator  indicate 
a malfunction.  Next  the  test  principle  is  applied.  The  way  of  applying  the  test  principle  determines  how 
long  it  takes  for  a malfunction  to  become  evident.  The  objective  of  the  test  method  is  to  ensure  that  the 
input  signal  adequately  exercises  all  components  in  the  signal  processing  unit.  The  higher  the  test 
frequency  for  all  processing  states,  the  faster  the  detection  of  any  malfunction  at  the  output. 

On  this  article  two  different  test  schemes  are  evaluated.  One  is  independent  of  the  process  and  its  status 
and  the  other  is  dependent  on  the  status  of  the  process.  In  the  latter  case  the  test  signal  is  simply  the 
normal,  unmodified  input  signal.  This  is  controlled  by  the  statistics  of  the  process  and  not  by  the 
detection  mechanism.  This  case  will  be  defined  as  passive  failure  detection,  as  opposed  to  active  failure 
detection.  For  the  latter  the  test  is  carried  out  independently  by  periodical  checks  of  all  states  of 
each  system  component.  In  most  cases  the  active  failure  detection  occurs  simultaneously  with  the  process 
i.e.  no  process  interrupts  are  necessary  for  testing  (ref.  3,4). 

Although  real  versions  of  failure  detection  are  not  necessarily  mechanised  exactly  in  one  of  these  two 
ways,  we  will  use  these  in  the  following  for  the  purpose  of  covering  as  wide  a spectrum  as  possible  of 
similar  systems. 


3.  failure  detection  performance  - integrity  function 


It  is  well  known  from  the  literature  (ref.  4,  5,  6)  that  the  stochastic  process  of  failure  occurrences  in 
electronic  components  is  described  as  a stationary  Markow  process,  where  the  density  function  f(t)  is 
exponential : 

(3.1)  f(t)  = X • 

and  the  coefficient  A,  known  as  failure  rate,  is  considered  as  a constant.  The  distribution  function  F(t), 
which  is  the  probability  of  a failure  event  tp  prior  to  t,  defined  as 

F(t)  = P {tp  < t} 

represents  the  integral  of  the  density  function: 
t . 

(3.2)  F(t)  = / f(i)  dT  = l-  e^^  = l-  R(t) 

0 

where  R(t),  known  as  the  reliability  function,  is  the  probability  that  no  failure  occurred  up  to  t. 

An  important  property  of  this  type  of  stochastic  process  is  that  the  distribution  function  for  a certain 
time  interval  (ta,  ta  t t)  does  not  depend  on  the  time  prior  to  ta;  it  depends  only  on  the  width  t of  the 
time  interval.  This  can  be  shown  by  the  following  relationship  (ref.  5,  6,  7):  If  R(ta,  ta  + t)  is  the 
conditional  probability  that  no  failure  will  occur  within  the  interval  (ta,  ta  + t)  assuming  also  that  no 
failure  has  occurred  before  t^,  then  (see  fig.  3.1) 


(3.3) 


R(t^,  t^  t T) 


R(t  + t) 

R(tg) 


g-X(ta  + T) 
e'Xta 


e 


-At 


= 1 - F(t^,  + T)  . 

For  an  electronic  component  with  failure  detection  capability,  from  which  at  the  actual  time  t^  correct 
information  has  been  obtained  that  no  failure  has  occurred  so  far,  then  again  RCt^,  t^  t t)  = R(t).  As  can 
be  concluded  from  fig.  3.1,  the  value  of  RCt^,  ta  + t)  can  be  considerably  above  that  of  the  original  R(t). 
In  that  case,  R(ta»  ta  + t)  can  even  be  1 for  t = 0,  i.e.  full  integrity  has  been  established.  Of  course, 
it  has  been  assumed  that  the  failure  detector  worked  ideally  and  corresponding  to  its  message  no  failure 
has  in  fact  occurred.  That  is  not  true  for  most  of  the  applications,  and  RCt^),  which  is  R(ta,  ta  t t) 
with  T = 0,  is  less  than  1.  Therefore  shortcomings  in  the  performance  of  the  failure  detector  directly 
increase  the  uncertainty  of  the  integrity  measurement  time  t^.  As  a measure  for  this  uncertainty,  RCta)  is 
used  in  the  following,  which  is  called  the  integrity  function.  The  integrity  function  contains  what  is 
known  up  to  ta  and  it  can  be  understood  as  a kind  of  transition  function  for  the  probability  of  surviving 
the  rest  of  the  mission. 

In  order  to  evaluate  the  effect  on  the  integrity  function  on  the  different  methods  of  failure  detection, 
the  degree  of  redundancy,  which  of  course  also  effects  the  overall  integrity  is  first  eliminated.  This  can 
be  achieved  by  the  assumption  that  sufficient  spare  units  are  available  to  repair  each  failure  which  has 
been  detected. 

With  this  assumption,  the  integrity  function  R(ta)  can  be  derived  by  considering  a single  channel  unit  only 
and  by  assuming  that  no  failure  has  been  detected  up  to  t^. 


4.  The  integrity  function  of  signal  processing  systems 
with  passive  failure  detection 

In  this  section,  the  integrity  function  will  be  derived  for  an  electronic  signal  processing  system  with 
passive  failure  detection  for  the  system  structure  shown  in  fig.  4.1.  Corresponding  to  the  assumption  that 
sufficient  spare  units  are  available  to  repair  each  failure  which  has  been  detected  (section  3.),  fig.  4,1 
shows  a single  system  channel  consisting  of  the  signal  processor  (SP)  and  the  failure  detector  (PD),  The 
PD  consists  of  the  comparator  and  a second  signal  processor  which  produces  the  signal  for  comparison.  A 
detected  failure  will  be  repaired  by  switching  over  to  a spare  unit  of  the  same  structure.  This  structure 
is,  for  instance,  similar  to  that  of  a duo-duplex  system  with  passive  failure  detection.  With  regard  to 
the  following  analysis  this  structure  has  significant  differences  to  other  system  layouts,  where  passive 
failure  detection  is  carried  out  by  majority  or  mid-value  voting  etc.  For  these  latter  systems,  more 
aspects  have  to  be  accounted  for  than  is  outlined  in  the  following  sections.  In  particular,  the  event  of 
double-  or  multi-failures  becomes  much  more  important.  This  will  briefly  be  covered  in  the  appendix. 

For  a system  structure  corresponding  to  fig.  4.1  two  aspects  in  addition  to  redundancy  constraints  are 
important  for  system  integrity.  These  are  the  effects  of 

^ a failure  in  the  failure  detector  and 
^ dormant  failures  in  the  signal  processor. 

Both  will  be  covered  in  the  following. 
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4.1  A failure  in  the  failure  detector  (PD) 

So  far  as  is  known,  there  exists  no  system  with  passive  failure  detection  which  can  thoroughly  diagnose 
failures  in  the  failure  detector  itself.  This  kind  cf  failure  mode  is  very  undesirable.  A subsequent 
failure  within  the  signal  processor  cannot  be  prevented  from  propagating  into  other  connected  units 
because  the  failure  detector  failed  to  alarm.  Therefore,  this  case  has  to  be  covered  by  the  integrity 
function. 

For  the  sake  of  the  following  analysis  two  main  failure  modes  are  defined: 

1.  SPF  - failure  within  the  active  signal  processor  (SP) 

2.  PDF  - failure  within  the  failure  detector. 

The  second  is  the  one  to  be  mainly  treated  in  this  section. 

The  failure  PDF  is  separated  further  into  two  categories  with 

PDF+  designating  a failure  of  the  failure  detector  which  will  be  indicated  and 

PDF-  designating  a failure  of  the  failure  detector  which  will  not  be  indicated. 

The  statistic  characteristics  of  these  failure  modes  are  of  the  same  nature  as  given  for  electronic 
components  in  equ.  (3.1)  and  (3.2).  The  density  functions  are: 


'^PD^ 

^on  “ ^pn  * ® 


^PD  ■ ^PD  ‘ ® ’ ^PD  ■ ^PD  ' ® 


For  the  failure  event  PDF«,  the  integrity  function  must  at  least  include  the  probability  that  prior  to  a 
possible  failure  SPF  a failure  PDF-  has  occurred.  This  event  is  designated  by  (C)  and  is  considered  to  be 
equivalent  to  the  loss  of  the  system  integrity.  The  event  {£}  can  be  illustrated  by  the  diagram  in  fig. 

4.2  using  tSP  ^nd  tpD  » the  time  values  of  the  failure  events,  as  the  random  variable  coordinates.  All 
possible  combinations  of  random  events  SPF  and  PDF-  within  the  elasped  time  of  operation  are  contained  in 
the  shaded  area.  {£}  is  dependent  on  the  actual  time  of  operation  ta  and  it  attains  its  maximum,  when  ta 
approaches  te»  the  end  of  the  mission.  The  integrity  function,  which  only  considers  this  failure  event, 
is  designated  as  RpD  (ta)«  it  represents  the  probability  that  (C)  does  not  occur  within  the  time  inter/al 
(0,  ta): 


(t  ) = 1 - Fn 


(t^)  = 1 - P {£> 


The  probability  P {£}  can  be  determined  from  fig.  4.2  by  integrating  the  joint  density  function  ot  the 
events  SPF  and  PDF-  over  the  shaded  area  (ref.  5): 

(■*•3)  P (C)  = T P . d d 

The  events  SPF  and  PDF-  are  independent.  As  was  done  in  equation  4.3  the  product  of  the  density  functions 
of  SPF  and  PDF-  can  be  substituted  for  the  joint  density  function.  The  densities  are  given  by 

(4.4)  fgp  (tgp)  = Xgp  • e and 

'^PD  ^PD- 

fpD  (lp£j  ) = ® * » such  that  the  integration  in  equ.  (4.3)  results  in 


P (O  = 1 


■^SP 


(^SP  * ^PD- 


'SP  ^PD 


In  fig.  ‘1,3,  (1  - RpD  (ta))»  which  is  equal  to  P {C),  is  plotted  for  different  values  of  the  rate  of  the 
failure  event  PDF_.  The  figure  shows  that  for  a failure  rate  X_p  = 10"‘*/[h],  a realistic  •magnitude,  the 
integrity  function  decreases  slgtilficantly  from  1. 

4.2  Dormant  failures 


As  opposed  to  analog  systems,  the  information  in  digital  systems  is  split  up  bit-wise  and  the  processing 
functions  also  are  often  separated  component-wise  so  that  there  are  only  two  signal  states  (0,  1).  During 
operational  modes  a great  number  of  components  are  temporarily  not  actively  involved  in  the  process.  During 
this  period  of  idle  or  stationary  operation  these  components  cannot  be  tested  by  passive  failure  detection 


and  the  corresponding  failures  are  known  as  dormant  failures.  This  kind  of  failures,  of  course,  has  a great 
effect  on  the  integrity  function.  It  follows  that  the  integrity  function  not  only  depends  on  the  statistical 
distribution  of  component  failures  but  also  on  the  distribution  of  the  excitation  of  each  system  component 
and  the  corresponding  operational  mode.  For  the  case  of  an  unfavourable  statistical  distribution  of  the 
operational  modes,  i.e.  a great  probeUsility  of  long  time  intervals  between  failure  occurrence  and  failure 
detection,  the  system  integrity  may  be  lost  a long  time  with  no  indication.  By  the  way,  a special  case  of  a 
dormcUit  failure  which  will  not  be  indicated  at  all  is  the  PDF-  as  defined  in  section  4.1.  Only  extensive 
software,  which  will  not  be  dealt  with  in  this  article,  can  bring  scxne  relief  to  this  problem.  In  this 
section,  the  effect  of  dormant  failures  on  the  integrity  function  will  be  described. 

In  a similar  manner  to  what  was  done  in  section  4.1,  the  set  of  potential  dozmiant  failures  within  the 
signal  processing  unit  will  be  defined  here  by  the  set  {£}.  Then,  the  integrity  function  can  be  written 
as 

(4.6)  ^ ‘ ^ 

or,  including  both  the  signal  processor  and  the  failure  detector,  the  resulting  integrity  function  is 


(4.7)  R(t^)  = 1 F(t^)  = 1 - P {C  + D) 


Fig.  4,4  shows  the  area  of  all  elements  of  (D)  for  a signal  processing  component  k in  the  tsP(k)*  AtMOD(k)" 
plane.  The  random  variable  tspr^j  gives  the  time  of  the  failure  event  in  component  k,  and  itMOD(k)  represents 
the  time  interval  between  the  railure  in  component  k and  the  next  attempt  to  operate  in  mode  k addressed  to 
component  k.  The  shaded  area  in  fig.  4.4  represents  the  subset  {D(k)>  for  any  k = 1,  2,  ...  N of  the 
complete  set  {D},  The  actual  time  of  operation  ta  and  the  mission  completion  time  t^  define  the  limits  on 
the  area  (£(k)T.  If  the  turn  on  rate  for  the  operational  mode  k,  designated,  as  YMOD(k)*  is  assumed  constant, 
then  the  density  function  for  the  next  turn  on  can  be  considered  to  be  of  the  exponential  type.  Therefore, 
in  the  course  of  this  article,  it  will  be  assumed  that 


(4.8) 


'^MOD(k)*^ 


A more  precise  formulation  of  the  density  function  can  be  gained  from  experimental  data.  Assuming  that  the 
events  of  operational  mode  turn  on  and  of  failures  are  independent,  the  probability  of  event  {D(k)>  is 
then  obtained  by 


(4.9) 


P {D(k))  -II 
0 0 


^a'^SP(k) 

-/  / 

0 0 


^SP(k) 

^SP(k) 


^*^MOD(k)  '^’^MOD(k) 

*^“MOD(k)  '^’^MOD(k) 


By  substituting  for  the  density  functions  corresponding  to  equs.  (4.1)  and  (4.8)  and  evaluating  the  integrals, 


(4.10) 


P (D(k)}  = 


SP(k) 


^SP(k)  ' ^MOD(k) 


,,  *'^a^^SP(k)  ■ ■''H0D(k)\  ■''^a^MOD(k)  "*^e^H0D(k), 

(1  - e ) • (e  - e ) 


Since  the  complete  set  (D)  has  to  be  considered  for  the  integrity  function  in  equs.  (4.6)  and  (4.7), 


(4.11)  (D)  = (D(l)}  + (D(2))  + ...  t {D(N)1  , 


the  probability  of  the  event  (D)  can  be  derived  from  the  probabilities  of  the  events  (D(k)),  < esponding 
to  equ.  (4.10),  by 


N 

(4.12)  P (D)  = 1 - n (1  - P (D(k))) 

k=l 


For  equ.  (4.12)  it  is  assumed  for  the  sake  of  simplicity  that  the  events  (D(k))  are  independent.  This  may 
not  be  true  in  real  oases.  Most  digital  systems  are  coupled  in  a way  that  this  assumption  does  not  hold. 
Then  the  probability  P (D)  has  to  be  derived  by  the  sum  of  the  probabilities  F (D(k)}  and  their  coupled 
teniis.  Higher  order  couplings  can  be  negleted  because  of  their  insignificant  contribution  to  the  result  of 
P {£).  For  example,  if  only  pairs  of  coupling  terms  are  considered,  the  relationship  for  P (£)  is  given 
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by 


N N N 

(4.13)  P (D)  = I P {D(k)>  - [ I P{D(k),D(l)}  . 

k=l  1=1  k:=l 

k>l 


To  determine  the  coupling  terms  P {^(k),  £(!)},  the  fact  that  more  than  one  operational  mode  is 

addressed  to  each  component  k and  that  each  operational  mode  may  be  addressed  to  several  components  must 
be  taken  into  account.  Although,  this  can  also  be  considered  analytically  in  rather  simple  terms  by  use  of 
a combinational  matrix,  no  further  outline  will  be  given  in  this  article. 

In  the  simplest  case,  assuming  independence;  equ.  4.7  can  be  reformulated  to  yield 


(4,14) 


R(t^)  = 1 - (P  {C)  t F {£)  - P {C,  D}) 


= 1 - [P  {C}  t P (D)  (1  - P {C})I 


Further,  substituting  for  P (D)  using  equ.  4.12  yields 

N 

(4.15)  R(t^)  = 1 - [P  (C)  t [1  - n (1  - P (D(k)})]  (1  - P (0)1 

® ~ krl  - 


Fig.  4.5  shows  a diagram,  where  equ.  (4.15)  is  evaluated  for  a mission  length  of  t 
hardware  example  of  a signal  processing  unit  with  ® 


= lo'^/Lh] 

SP(k) 

= 10'^/[h] 

PD 

= 0,5  • 10'‘*/[h] 

1 [h] , a typical 


and  a mode  configuration  of  N : 10  or  100.  Thi  rate  of  mode  turn-on  yMOD(k)  ts  assumed  to  be  the  same  for 
all  modes.  It  varies  from  0,l/[h]  to  10/[h],  As  can  be  seen  from  the  plot,  the  more  modes  in  which  the 
signal  processor  are  operating,  the  lower  the  integrity  levels.  The  rate  of  mode  tum-on  also  changes  the 
time  behavior  of  the  integrity  function.  For  a rate  of  1 turn-on  per  1/10  of  an  hour,  the  integrity 
function  has  a very  shallow  minimum  at  a rather  small  value  of  the  mission  time.  For  a rate  of  1 turn-on 
per  10  hours  the  integrity  function  decreases  continuously  up  to  about  90  per  cent  of  the  complete  mission 
time  ts.  Of  course,  at  the  end  of  the  mission,  the  integrity  function  recovers  to  the  value  of  (1  - P (O), 
because,  by  definition,  for  ta  = te  the  value  of  P (C)  becomes  zero.  ~ 


4 . 3 Constrained  redundancy 

In  the  foregoing  sections  the  assumption  was  made  that  detected  failures  are  repaired  by  available  spare 
units  right  away  no  matter  how  many  failures  have  occurred.  This  was  done  to  eliminate  the  effect  of  finite 
redundancy  on  the  integrity  and  to  focus  only  on  the  effect  of  the  performance  of  the  failure  detection. 
Thus,  in  the  foregoing  sections,  detected  failures  were  without  influence  on  the  integrity  function. 

However,  if  the  redundancy  is  constrained  the  system  integrity  is  also  influenced  by  detected  failure  events 
because  of  the  resulting  reduction  in  available  redundancy.  For  a signal  processor  with  as  many  as  x 
redundant  units  as  shown  in  fig.  4.1,  the  integrity  function,  i.e.  the  probability  of  no  failure  occurrence 
in  all  of  the  x units  until  tlie  acutal  linie  of  operation  t^,  is 

(4.16)  R(ta,  x)  = X • R(t,)  = 1 - F(t  )’' 

a a 

Substituting  for  R(ta)  from  equ.  (4.7) 


(4.17)  R(t^,  x)  = X • [1  - P {£  + D}) 

Certainly,  considering  the  criterion  for  a mission  abort,  which  for  instance  should  ensure  a safe  return, 
the  probability  of  surviving  the  mission  or  its  complement,  the  probeibility  of  total  loss  within  the 
mission,  becomes  important.  In  that  case,  using  the  results  of  the  foregoing  sections,  the  survival  of 
the  mission  is  to  be  excluded,  if  for  a system  with  Z units  still  without  detected  failures,  out  of  x 

1.  a failure  of  the  type  PDF-  occurs  in  one  of  the  Z units 

2.  all  Z remaining  units  have  failed. 
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The  second  condition  includes  all  combinations  of  Z unit  failures  corresponding  to  the  pattern;  i 
undetected  failures  at  the  actual  time  ta  and  Z - i failures  within  the  time  interval  (ta,  te>. 

Thus  the  probcibility  of  total  loss  within  the  mission  as  a function  of  the  actual  time  of  operation  can 
be  formulated  as 


(4.18) 


1 - Rp-  (t^)  + I (f)  R (t  (1  - R'=(t  ))^  • (1  - R’Ht^.t  ))^"^ 
1=0 


This  is  applicable  for  a criterion  for  the  mission  abort  of,  for  instance. 


(4.19) 


''TL<^a>  1 


R'Hta)  from  equation  (4.6)  is  substituted  into  equation  (4.18)  and  R(ta,  te)  can  be  derived  in  a manner 
similar  to  that  used  for  equ.  (3.3).  The  expression  (1  - Rpj)  (t^))  represents  the  system  loss  due  to  a 
failure  PDF-  within  the  failure  detector. 

In  fig.  5.3,  equ.  (4.18)  is  evaluated  for  the  nuni*r*'ical  example  of  fig.  4.5.  Remarks  are  given  in  the  next 
section  in  conjunction  with  correaponding  results  for  active  failure  detection. 


5.  The  integrity  function  of  signal  processing  systems 
with  active  failure  detection 


As  defined  in  section  2,  active  failure  detection  is  in  a prearranged  way,  such  that  a thorough  check  of 
all  components  is  made  within  deterministic  time  intervals.  The  test  procedure  can  be  either  sequential 
or  simultaneous (ref.  3,  4).  No  realization  of  a continuous  active  failure  detector  with  no  delay  between 
failure  occurrence  and  failure  detection  is  known,  but  the  test  cycle  can  be  chosen  as  small  as  necessary 
so  that  the  probability  of  failures  known  as  double  failures  is  small  enough  to  be  neglected.  Double 
failures  disable  the  comparison  check  because  there  are  similar  failures  in  both  the  signal  processor  and 
the  signal  used  for  the  comparator,  occurring  within  the  time  interval  without  test  activity. 

For  the  system  with  active  failure  detection,  the  same  general  system  structure  as  shown  in  fig.  4.1  for 
the  system  with  passive  failure  detection  can  generally  be  used. 

Three  distinctions  should  be  noted: 

1.  Inherently  several  comparators  are  required  within  the  single  channel  structui?e  of  a signal  processing 
system  with  active  failure  detection. 

2.  A selfdiagnosing  comparator  as  one  crucial  part  of  the  active  failure  detector  is  feasible  (ref.  3)  as 
contrasted  with  the  passive  failure  detection  case. 

3.  In  most  applications  the  complete  signal  processor  need  not  be  doubled  in  order  to  generate  the  signals 
to  be  compared  to  each  other. 

Therefore,  by  virtue  of  the  selfdiagnosing  comparators  and  the  complete  test  within  small  time  intervals 
the  integrity  function  of  a signal  processor  with  active  failure  detection  is  not  dependent  on  the  effects 
of  dormant  failures.  Thus,  if  the  test  cycle  time  is  small  enough,  about  the  only  effect  on  the  integrity 
to  be  considered  comes  from  the  redundancy  constraints.  In  order  to  demonstrate  this  effect,  the  integrity 
function  will  be  again  derived  for  the  case  of  unconstrained  redundancy. 


5.1  Without  redundancy  constraint 

If  no  redundancy  constraints  are  assumed,  the  same  assumptions  that  were  used  in  sections  4.1  and  4.2  are 
valid,  i.e.  all  failed  components  are  repaired  in  the  same  instant,  when  their  failure  is  detected.  The 
test  cycle  time  of  the  selfdiagnosing  active  failure  detector  (SAFD)  is  chosen  as  T,  such  that  the  inte- 
grity function  for  the  values  of  the  actual  time  ta 


(5.1)  t^  = V • T , V = 0,  1,  2,  ...,  is 

(5-2)  ^ ’ i.e.  all  test  information  is  available 


and  the  integrity  is  established.  During  the  time  intervals  between  the  discrete  test  instants  no  infor- 
mation is  given  of  interim  failure  events  by  the  failure  detection  mechanism.  When  the  actual  time  of 
operation  ta  coincides  with  the  v-th  test  cycle,  sjch  that  vT  ^ ta  £ (v  + 1)  T (fig.  5.1),  then  the 
probability  of  no  failure  event  within  the  interval  (vT,  ta)  is  identical  to  the  integrity  function.  By 
virtue  of  the  selfdiagnosing  property,  the  rffect  of  a failure  in  the  failure  detector  is  not  different 
from  that  of  a failure  in  the  signal  processor  itself.  As  opposed  to  passive  failure  detection,  only  the 
total  failure  rate  of  both  signal  processor  and  SAFD  components,  designated  as  Xf>P/AD,  is  used.  Thus 
(fig.  5.1),  the  integrity  function  is 


(5.3) 


CD  /An  ‘■a  ^ 

R(t  ) - R (vT,  t Fe  with  v = 0,  1,  2 ...  and  vT  < t < (v  + 1)  T . 

a a — a — 

Kig,  5.2  shows  a plot,  where  equ.  (5.3)  is  evaluated  for  the  minimuin  value  of  R(ta)»  several  values  of 
Xs?/AD  varying  T.  The  shaded  area  in  fig.  5.2  indicates  the  region  where  the  T-values  of  known  systems 
with  SAFD  have  been  chosen.  It  becomes  obious  that  the  integrity  function  can  be  kept  extremely  small 
within  the  intervals  between  the  test  instants.  The  values  of  ^SP/AD  chosen  so  that  these  results  for 
the  integrity  function  can  be  compared  with  those  for  the  signal  processor  with  passive  failure  detection 
in  fig.  4,5. 


5.2  Constrained  Redundancy 

In  a similar  way  to  that  discussed  in  section  4.3,  the  influence  of  limited  redundancy  on  the  system 
integrity  is  also  outlined  for  a system  with  selfdiagnosing  active  failure  detection  (SARD). 

The  integrity  function  of  the  complete  system  is  also  determined  by  the  integrity  function  of  a single 
channel  (equ.  (5.3))  multiplied  by  the  full  number  of  units  available  for  operation.  As  outlined  in 
section  4.3,  the  probability  of  total  system  loss,  which  is  also  dependent  on  the  integrity  function, 
becomes  more  important  in  the  context  of  the  criterion  for  the  mission  abort.  This  probability  can  be 
given  for  the  system  with  SAFD  in  a similar  form  to  that  outlined  in  section  4.3,  Although  the  influence 
of  the  noncontinuous  test  information  is  negligible  in  most  cases,  it  will  be  considered  for  the  sake  of 
completeness.  For  a system  with  2 remaining  units  available. 


(5.4) 


Although  the  similarity  to  equ.  (4.18)  is  obvious,  the  results  are  different  because  of  the  predominance 
of  the  last  term  (1  - R(ta,  te))^”^»  in  contrast  to  R'*(ta)  of  equ.  (4.18),  R(ta)  in  equ.  (5.4)  isvery  close 
to  one  and  does  not  fall  below  its  minimum  values  as  plotted  in  fig.  5.2  (see  also  fig.  4.5).  This 
characteristic  can  be  concluded  also  frc«n  fig.  5.3,  where  both  equ.  (4.10)  and  equ.  (5.4)  are  evaluated 
for  the  signal  processing  system  with  passive  failure  detection  used  as  an  example  in  section  4,  and  also 
for  an  equivalent  example  of  a system  with  SAFD.  The  curves  for  the  latter  case,  plotted  as  dashed  lines, 
decrease  considerably  with  increasing  actual  time  of  operation,  whereas  for  the  case  with  passive  failure 
detection,  the  probability  of  total  system  loss  does  not  change  much  throughout  the  complete  mission.  This 
indicates  that  for  passive  failure  detection  the  criterion  for  the  mission  abort  derived  from  the  amount 
of  detected  failures  makes  some  sense.  For  systems  with  active  failure  detection  however,  the  proLatility 
of  total  system  loss  can  recover  considerably  with  increased  time  of  operation. 

For  a certain  FxLj»«jy^X  " const.,  established  as  a criterion  and  plotted  in  fig.  5.3,  a first  failure  prior 
to  ti  in  a system  with  three  redundant  units  and  a mission  time  of  10  hours  results  in  a mission  abort  in 
both  cases.  However,  a first  failure  within  the  time  interval  (ti,  t2),  as  indicated  in  fig,  5.3,  only 
results  in  a mission  abort  for  the  passive  failure  detection  case.  The  system  with  SAFF  can  still  be 
operated  without  functional  degradation.  Notice,  that  (ti,  t2)  is  about  one  third  of  the  entire  mission 
time. 


6.  HFB  320  experimental  system 

For  the  purpose  of  investigating  the  feasibility  of  selfdiagnosing  active  failure  detection  (SAFD)  in  all 
parts  of  a digital  fly-by-wire  flight  control  system,  the  DFVLR  is  conducting  an  experimental  program  on 
the  HFB  320  aircraft,  as  shown  in  fig.  6.1.  The  functional  subsystems  and  exponents  installed  in  the  air- 
craft are  illustrated  on  the  same  figure.  Besides  manual  control  on  the  mechanical  back-up  control  system, 
the  functional  layout  of  this  system  includes  three  fundamental  modes  of  operation  for  the  purpose  of 
stabilization  and  flight  path  control: 

- Full  automatic  control 

Semiautomatic  control,  i.e.  flight  path  demand  control  by  use  of  the  side  grip 
controller  (see  fig.  6.2) 

- Manual  fly-by-wire  control. 

In  this  section,  a brief  description  of  the  control  computer  which  is  the  signal  pi'ocessing  part  of  the 
system  will  be  given  with  particular  emphasis  on  the  integrity.  Essentially  a new  computer  had  to  be 
developed  Ibr  the  purpose  of  incorporating  active  failure  detection  methods  into  the  computer  design. 

The  computer  structure  is  shown  in  fig.  6.3  where  all  peripheral  systems,  such  as  experimental  keyboard, 
tape  reader,  power  supply  etc.  are  not  illustrated.  The  failure  detection  is  mechanised  by  hardwiring 
methods,  but  for  this  experimental  programm  no  use  is  made  of  hardware  integration  methods  for  miniaturisation 
purposes . 

Before  more  details  are  given  about  computer  subsystems,  a brief  outline  of  the  logic  structure  will  be 
presented. 
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6 . 1 Logic  Structure  of  the  Control  Computer 

The  control  computer  has  to  be  a real  time  computer  which  is  one  of  the  main  factors  for  establishing 
the  characteristics  of  the  logic  structure.  The  input-output  data  flow  with  respect  to  the  peripheral 
system,  such  as  sensors,  the  instrumentation  and  the  actuation  systems,  is  autonomously  managed  by  the 
interfaces.  The  interface  memory  is  accessible  by  the  main  program  which  is  also  the  case  for  internal 
read  only  and  random  access  memory,  i.e.  a direct  memory  access  is  mechanised  with  respect  to  the  input- 
output  data. 

The  computer  memory  is  byte-organised.  It  is  divided  into  two  parts,  the  data  and  the  program  memory.  The 
address  domain  for  the  instructions  for  data  transfer  and  arithmetic  operations  only  includes  the 
addresses  of  the  data  memory  and  those  of  the  interface  memories.  The  capacity  of  the  data  memory  is 
960  bytes,  of  which  192  bytes  (64  data  words)  cire  read  only  and  the  remaining  random  access. 

The  program  memory  comprises  7231  bytes.  The  instructions  stored  in  this  memory  are  of  variable  word 
length,  i.e.  2,  3 or  4 bytes,  depending  on  the  amount  of  infonnation  to  be  contained.  The  instruction  re- 
servoir consists  of  7 instructions  with  respect  to  the  memory,  3 transfer  instructions,  8 jump  instruct- 
ions, and  64  instructions  for  other  functions.  Also  the  potential  for  subprogramming  exists. The  internal 
information  transmission  is  brought  about  by  a bus  system  consisting  of  four  t}^es  of  buses,  the  instruct- 
ion bus  (IN),  the  operand  bus  (OF)  and  the  result  bus  (PE)  as  data  buses  to  and  from  the  arithmetic  and 
logic  unit  and  the  control  feedback  bus  (CF)  for  enabling  the  next  program  step.  All  buses  are  working 
serially  and  are  failure  detected  by  SAFD. 

6.2  Main  Redundancy  and  Failure  Detection  Concept 

The  computer  is  split  up  into  subsystems,  socalled  moduls.  There  are  five  functionally  different  moduls, 
the  ALU,  memory,  input  interface,  output  interface,  and  the  failure  detection  management  system  ^fig.  6.3). 
Except  for  the  last  two,  all  moduls  are  duplicated.  The  redundancy  of  the  output  interface  is  achieved  in- 
ternally. Because  of  its  selfdiagnosing  property  the  failure  management  system  need  not  be  duplicated,  ex- 
cept the  switching  units.  Also  the  data  links  are  duplicated. 

Each  modul  output  of  those  moduls  which  are  duplicated  is  linked  to  the  failure  management  system  by  a 
switching  unit  such  that  only  one  of  the  two  moduls  is  providing  the  data  which  are  to  be  fed  into  another 
pair  of  moduls.  Because  of  these  switching  units  the  failure  management  system  is  able  to  completely  eli- 
minate a failed  modul  from  the  process.  The  failure  message  is  generated  within  that  modul  where  the  failure 
occured.  Besides  indicating  the  failures  and  switching  to  spare  moduls,  the  failure  management  system  en- 
sures fail-safe  behavior  for  the  case  when  two  moduls  of  the  same  kind  have  failed. 

In  summary,  because  of  the  redundancy  concept,  the  system  survives  one  arbitrary  failure.  The  system  also 
survives  without  degradation  in  its  functional  performance  when,  in  addition  t)  the  first  failure,  another 
failure  occurs  in  a different  modul.  If  either  part  of  one  pair  of  moduls  has  failed,  the  fail-safe  routine 
is  employed. 

6.3  Modul  Failure  Detection 

In  this  section,  two  moduls  are  described  in  more  detail  as  typical  examples  in  order  to  give  more  insight 
into  how  active  seJfdiagrosing  failure  detection  is  realized. 

6.3.1  Arithmetic  and  Logic  Unit  (ALU) 

As  shown  in  fig.  6.4  via  the  operand  bus  from  the  memory  of  the  input  interface  the  data  to  be  worked  on 
are  fed  into  the  AI.U  modul . The  output  of  the  ALU  is  transmitted  to  the  memory  and  the  output  interface  via 
the  result  bus.  The  instruction  bus  and  the  control  feedback  bus  couple  the  ALU  to  the  main  control  unit 
within  the  memory  modul.  There  are  buffering  registers  because  the  transmission  frequencies  used  on  the 
buses  are  different  from  that  internal  the  ALU.  The  accumulator  works  serially  in  order  to  keep  the  number 
of  components  down.  The  internal  control  unit  is  a read  only  memory  which  can  easily  be  checked. 

In  order  to  detect  failures  actively  within  the  accumulator  two  methods  are  applied.  The  first  method  is 
that  of  using  Manchester  coding  to  represent  the  information  by  signal  changes  (e.g.  1 tc  0)  instead  of  the 
signal  amplitude.  The  occurence  of  these  signal  changes  is  equivalent  to  a dynamic  test  signal  and  can  be 
considered  as  such  for  the  purpose  of  selfdiagnosing  active  failure  detection  (ref.  3).  The  data  entering 
the  registers  are  also  coded  in  this  way.  The  second  method  is  the  sequential  test.  If  SAFD  is  to  be  real- 
ized for  the  ALU,  for  simplicity's  sake,  both  methods  are  used  for  the  testing  procedure.  For  further  re- 
duction of  the  amount  of  components,  the  ALU  works  serially.  In  order  to  carry  out  the  sequential  test,  the 
registers  are  lengthened  for  a test  sequence.  This  sequence  is  coded  in  the  same  way  as  the  main  signal 
and  is  monitored  at  the  ALU  input  and  output.  The  test  sequence  can  be  chosen  such  that  all  possible  internal 
states  of  the  ALU  components  are  tested.  It  has  been  shown  that  a test  sequence  of  8 bit  is  sufficient  to 
achieve  this  complete  test.  When  an  addition  is  to  be  carried  out,  first  the  test  sequence  passes  the 
adder.  As  soon  as  the  test  response  is  provided,  i.e.  after  8 shifting  cycles,  the  addition  itself  can 
start , 

6.3.2  Input  Interface 

Tha  Input  interface  contains  64  data  channels  of  both  analog  and  digital  inputs.  Periodically  all  of  this 
input  information  is  fed  into  the  interface  memory  in  a predefined  order.  The  failure  detection  with 
respect  to  the  analog  inputs,  in  particular  for  the  multiplexer,  has  been  achieved  by  a combination  of  de- 
tection methods.  On  one  hand  the  analog  multiplexer  is  doubled  (fig.  6.5)  and  its  outputs  are  compared  by 


10-9 


a precision  comparator.  On  the  other  hand  an  additional  spare  input  channel  is  used  for  a test  signal 
which  is  generated  by  the  internal  processing  controller.  A certain  nurnber  of  discrete  amplitude  levels 
are  contai.ned  in  the  test  signal  such  that  the  multiplexer  is  tested  in  its  full  amplitude.  The  compari- 
son takes  place  within  the  processing  controller  after  the  signal  has  passed  the  A/D-converter . Again  the 
digital  signal  is  coded  by  use  of  the  Manchester  code.  The  integrity  of  the  coding  mechanism  is  checked 
at  the  memory  input  and  output,  for  that  purpose  ail  received  data  are  read  after  they  are  fed  into  the 
memory.  The  processing  of  the  digital  inputs  is  carried  out  in  a similar  but,  of  course  simpler  way. 


7 . Conclusions 

Despite  the  fact,  that  there  are  a great  number  of  system  designs  and  developments  for  digital  flight  con- 
trol systems,  not  enough  work  has  been  done  in  the  area  of  quantitative  comparison  of  these  systems 
with  respect  to  the  integrity  and  the  mission  survival  probability.  This  article  contributes  to  the  ana- 
lytical use  of  the  integrity  function  which  contains  tne  information  about  the  system  status  at  the  actual 
time  of  system  operation  and  which  can  be  used  for  the  determination  of  the  probability  of  system  loss 
during  the  remaining  time  of  the  mission. 

In  particular,  the  influence  of  the  quality  of  the  malfunction  detector  on  the  integrity  function  is  eva- 
luated for  two  versions,  passive  failure  detection  and  active  failure  detection.  It  is  shown  that  for 
passive  failure  detection  the  integrity  function  as  a function  of  the  actual  time  of  operation  is  deterior- 
ated considerably  by  possible  events  of  dormant  failures.  Following  from  that,  the  probability  of  system 
loss  during  the  rest  of  the  mission  as  a function  of  the  actual  time  of  operation  is  increased  significant- 
ly relative  to  the  corresponding  results  for  selfdiagnosing  active  failure  detection  (SAFD).  For  a system 
with  SAFD,  a great  amount  of  extra  operation  time  can  possibly  be  gained  before  a mission  abort  becomes 
necessary. 

Of  course,  rather  simple  systems  have  been  compared  within  this  article,  but  the  analytical  relationships 
are  also  valid  in  principle  for  quantitative  comparisons  of  more  complex  systems. 

8.  Appendix 

If  the  monitoring  signal  processor  in  fig.  4.1  is  taken  from  the  system  by  replacing  its  function  by  the 
spare  redundant  units,  the  system  structure  of  a system  with  malfunction  elimination  by  majority  voting 
or  mid  value  voting  is  obtained.  In  this  case,  in  addition  to  that  as  described  in  section  4,  a critical 
situation  arises,  if  half  of  the  amount  of  available  redundant  units  have  undetectedly  failed  by  dormant 
failures . 

The  probability  of  Z/2  dormait  failures  of  the  same  kind  in  a system  with  Z available  redundant  units  and 
no  failures  detected  in  these  units  until  the  actual  time  ta,  is  as  a function  of  t (equ.  4.10) 


(A.l)  Fp  (z/2;  t^)  = 


I I 


k=l 


I 


i,  ^ i 

1 m 

The  example  of  an  available  redundancy  of  four  units  would  yield 
N ^ 4 


(A. 2)  r.  (2  ; t ) = 


‘E  a'  ' I I 


k=l  i^=l 


P {D(k,i^),  BCk.ij)} 


^2  " h 


or  by  approximating  for  equ.  (A. 2),  assuming  that 
{£(k,i^)}  and  (£(k,  ij)}  are  independent  and  that 
Pi£(k,  i^)}  = PfDik.ij)} 

(A. 3)  Pj,(2;t  ) = 6 • N [P{D(k)H^ 
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Fig.  4.1  SINGLE  CHANNEL  SYSTEM  STRUCTURE  WITH  PASSIVE  FAILURE  DETECTION 


Fig.  4.4  AREA  OF  POSSIBLE  EVENTS  OF  DORMANT  FAILURES  AS  A FUNCTION  OF  THE 
ACTUAL  TIME  t, 
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Fig.  4.5  INTEGRITY  FUNCTION  FOR  SYSTEM  CHANNEL  WITH  PASSIVE  FAILURE  DETECTION 
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Summary 

Flight  guidance  systems  have  to  meet  severe  reliability  requirements  for  reasons  of 
flight  safety.  This  applies  in  particular  to  aircraft  equipment  and  flight  missions  that 
can  no  longer  be  controlled  manually  by  the  pilot.  Digital  realization  of  future  flight 
guidance  systems  is  expected  for  cost  reasons.  This  technology  facilitates  the  intro- 
duction of  advanced  redundancy  procedures  that  take  advantage  of  the  characteristics  of 
digital  systems  to  self-detect  failures  through  suitable  procedures. 

Procedures  for  detecting  failures  in  the  hardware  of  the  signal  processing  have  been 
developed.  This  failure  self-detection  is  carried  out  by  means  of  suitable  test  pro- 
grammes, and  it  is  supervised  by  an  external  supervisor.  The  design  of  this  supervisor 
is  based  on  the  principle  of  a watch-dog-timer . 

Two  supervisor  systems  have  been  developed  on  the  basis  of  this  principle.  The  improved 
version  allows  in  addition  the  execution  of  nominal/actual  value  comparisons  in  the 
supervisor,  and  this  increases  the  failure  self-detection  probability. 

The  possible  applications  of  the  failure  self-detection  are  discussed. 


1 . Introduction 

The  requirements  to  be  met  to  ensure  the  reliability  of  flight  guidance  systems 
are  very  high  for  reasons  of  flight  safety.  This  applies  in  particular  to  flight 
equipment  and  missions  that  can  no  longer  be  manually  controlled  by  the  pilot  such 
as  automatic  landings  under  bad  weather  conditions  or  unstable  aircraft  (CCV) . 

The  presently  existing  flight  guidance  systems  have  been  designed  in  accordance 
with  the  principles  of  parallel  redundancy.  This  resulted  in  a great  amount  of 
hardware  and  problems  for  testing  and  maintaining  the  systems. 

The  following  characteristics  will  be  typical  for  the  next  generation  of  flight 
guidance  systems: 

- increased  performance  requirements  for  the  flight  guidance 

- economy  requirements 

i.e.  reduction  to  a minimum  of  the  hardware 

- improvement  of  the  testability  and  maintainability 

- reliability 

The  additional  condition  of  economy  can  be  met,  for  instance,  by  means  of  new 
non-conventional  reliability  principles. 

Cost  investigations  show  that  with  increasing  complexity  of  the  flight  guidance 
systems,  the  price/performance  ratio  of  digital  systems  is  more  favourable  than 
for  analog  systems.  Digital  signal  processing  facilitates  the  introduction  of 
advanced  redundancy  techniques.  The  characteristic  of  digital  systems  to  auto- 
matically detect  failures  by  suitable  means  can  be  utilized  in  particular.  ’ 

The  redundancy  principles  developed  by  Bodenseewerk  consist  in  detecting  and 
localizing  hardware  failures  of  the  flight  guidance  system  by  means  of  suitable 
checking  and  test  programmes. 

The  essential  elements  of  a flight  guidance  system  are: 

- signal  processing  unit  composed  of 

- Processor  (CPU) 

- Read-only  memory  (ROM) 
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- Working  storage  (RAM) 

- Interface 

- Real-time  clock 

- Sensors 

- Actuator  system 

- Pilot's  control  panel 

The  failure  self-detection  procedure  allows  failures  to  be  detected  and  localized 
in  the  signal  processing  unit. 

The  monitoring  of  sensors,  actuator  system  and  pilot's  control  panel  can  be  ensured 
by  means  of  suitable  software  monitors  integrated  in  the  signal  processing  unit. 


2 . Principles  of  Failure  Self-detection 

The  principles  of  the  failure  self-detection  are  described  in  the  following: 

In  self-monitored  systems  it  has  to  be  basically  ensured  that  the  self-monitoring 
is  supervised  by  an  external  self-contained  unit. 

This  external  supervisor  must  be  fail-safe  within  the  scope  of  the  specified 
failure  rates.  The  supervising  of  the  failure  self-detection  is  necessary  as  the 
processor  of  the  signal  processing  unit  can  only  detect  failures  as  long  as  it  is 
functionally  in  order.  The  external  supervisor  assumes  the  evaluation  and  moni- 
toring of  the  failure  self-detection  programmes  and  disconnects  the  monitored 
control  channel  (figure  1)  in  the  event  of  a failure. 


FIGURE  1 

FAILURE  SELF-DETECTION  IN  AUTOMATIC  FLIGHT  GUIDANCE  SYSTEMS 


2 . 1 Automatic  Supervisor  System  (ASS) 

If  only  one  monitoring  is  carried  out,  an  automatic  supervisor  system  can  be  built 
up  which  operates  in  principle  as  a watch  dog  timer.  Unless  a reset  instruction 
is  given  in  time  to  the  supervisor  by  the  processor,  a disconnecting  instruction 
(figure  2)  is  carried  out  automatically  by  the  supervisor  after  an  adjustable  delay 


overflow 


FIGURE  2 

AUTOMATIC  SUPERVISOR  SYSTEM  (ASS) 


This  simple  automatic  supervisor  detects  and  monitors  therefore  specific  failures 
in  the  programme  (e.g.  infinite  loops)  as  well  as  clocic  failures  of  the  real-time 
clocic  and  the  cloc)c  generator. 


In  order  to  enable  the  supervisor  to  also  detect  failures  in  the  power  supply  of 
the  signal  processing  unit,  it  has  been  so  designed  that  the  shut-down  or  change 
over  relay  assigned  to  the  supervisor  disconnects  the  channel  in  the  event  of  a 
brea)<down  of  the  power  supply. 


Further  failures  in  the  signal  processing  unit  can  be  detected  by  means  of  suitable 
failure  self-detecting  programmes.  These  programmes  are  worl^ed  out  in  flight 
during  the  intervals  available  between  the  computing  cycles. 


The  following  tests  are  carried  out  within  the  scope  of  this  failure  self-detection 
programmes : 


A complete  checic  of  the  instruction  repertory  is  performed.  A suitable  test 
progrcUTune  is  woriced  off  by  presetting  constant  input  values.  By  utilizing 
all  instructions  available  in  the  processor,  results  are  obtained  which 
allow,  by  comparison  with  Icnown  nominal  values,  conclusions  about  failures 
in  the  processor  hardware,  to  be  drawn. 


The  contents  of  the  read-only  memory  (ROM)  are  verified  by  parity-check  in 
the  columns. 


The  working  storage  (RAM)  has  to  be  checked  in  a different  manner  than  the 
ROM  since  the  contents  of  the  RAM  change  continuously  and  are  therefore 
not  known. 


The  following  items  are  checked  by  means  of  the  developed  software  procedure 


the  memory  cells 
the  address  registers 
the  decoder  network 
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- the  read  and  write  amplifier 

- the  switch-over  units  for  read/write 

By  using  respectively  complementary  input  values,  a double  check  is 
carried  out  by  means  of  extensive  test  routines. 

- Interface  test 

The  interface  is  supervised  with  the  known  wrap-around  procedure  by  using  a 
software  monitor. 

After  completion  of  each  of  the  described  individual  tests,  a reset  instruction 
is  given  to  the  supervisor,  if  no  failure  has  been  detected  during  the  test. 

In  the  event  of  a failure,  the  supervisor  can  be  set  directly,  or  in  the  simplest 
case  indirectly,  by  the  programme  entering  a defined  stop. 

The  described  interplay  of  failure  self-detection  programme  and  supervisor  has 
been  checked  within  the  scope  of  laboratory  tests  and  flight  tests.  All  fai- 
lures artificially  generated  and  occured  in  this  connexion  have  been  detected 
by  a failure  self-detection  and  the  supervisor,  and  the  control  system  has 
never  been  switched  off  without  reason. 


2 . 2 Programmable  Automatic  Supervisor  System  (PASS) 

An  essential  feature  of  this  failure  self-detection  procedure  was  substantially 
improved  about  two  years  ago. 

It  is  easy  to  see  that  certain  failures  cannot  be  covered  by  the  described 
failure  detection  principle. 

It  has  been  shown  that  many  tests  performed  by  failure  self-detection  procedure 
end  in  a nominal/actual  value  comparison.  This  comparison  cannot  be  performed 
in  the  signal  processing  unit  as  any  failure  of  the  instructions  or  hardware 
necessary  for  the  comparison  could  result  in  a misinterpretation  which  might 
release,  in  general,  a reset  instruction  to  the  supervisor  irrespective  of  the 
result  of  the  nominal/actual  value  comparison. 

If  all  nominal/actual  value  comparisons  are  performed  externally  and  independently, 
a substantial  increase  of  the  procedure  reliability  is  possible. 

For  this  purpose  the  supervisor  was  extended  to  a programmable  automatic  super- 
visor system.  This  was  achieved  without  a too  great  increase  in  the  hardware 
(figure  3) . 


„C0DED  label" 
actual/expected  value 


FIGURE  3 

PROGRAMMABLE  AUTOMATIC  SUPERVISOR  SYSTEM  (PASS) 
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Coded  marks  (A,  , E.  ) are  offered  to  the  PASS  by  the  programme.  These  marks  are 
continually  varied’^by  the  coding  procedure  so  that  the  more  significant  half  of  the 
mark  is  identical  with  the  less  significant  half  of  the  priviously  generated 
mark  (figure  4) . 


PASS  RESET 
IF  A,  = En 


PASS  RESET 
IF  A,=  Ei 


An  ( En 

PASS  RESET 
IF  An=En-i 


FIGURE  4 

INTERPLAY  OF  FAILURE  SELF  DETECTION  AND  PASS 


The  marks  have  therefore  to  be  distributed  in  the  programme  so  that  a defined 
sequence  is  maintained  during  the  running  off  of  the  programme.  Due  to  the 
necessary  equality  of  the  more  significant  half  of  the  mark  and  the  less  signi- 
ficant half  of  the  previous  mark  (A.  = E. _,)  actual/nominal  value  comparison 
independent  of  the  computer  can  now’^be  performed  in  the  supervisor  by  transmitting 
to  the  supervisor  the  expected  result  E._.  prior  to  each  test.  At  the  end  of  the 
test  the  actual  result  A.  of  the  test  is  transferred  to  the  supervisor  in  the  more 
significant  half  of  the  new  mark.  A control  unit  of  the  PASS  compares  the  actual 
value  A|^  with  the  nominal  value  contained  in  the  buffer.  In  the  event  of 
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parity  the  control  unit  applies  a reset  signal  to  the  timing  element  of  the  super- 
visor PASS.  In  the  event  of  disparity  a disconnect  signal  is  generated  in  the 
supervisor. 

This  principle  is  generally  used  for  the  above  described  test  programmes.  In 
addition,  the  programme  flow  monitoring  and  hardware  failure  monitoring  can  be 
considerably  improved  concerning  the  following  items  by  using  the  supervisor  PASS: 


- loop  control 

- jump  instructions 

- inadmissible  frequency  variations  of  the  rel-time  clock. 


3 . Applications 

Taking  into  account  the  safety  requirements  to  be  met  by  a flight  guidance  system, 
the  degree  of  system  redundancy  is  derived  from  the  f light-mechanical  charac- 
teristics and  the  mission  requirements  the  aircraft  equipment  has  to  meet.  The 
selection  of  the  redundancy  concept,  for  instance  parallel  redundancy  or  failure 
self-detection,  clearly  depends  on  the  respective  specification,  the  weight,  the 
volume  and  the  electrical  power  consumption  as  well  as  on  the  latest  state  of 
the  art  and  the  associated  costs  for  development,  production  and  certification. 

The  certification  costs  can  be  very  high  for  advanced  technologies  and  procedures  , 

not  yet  proved  in  operational  application  such  as  the  method  for  failure  self-  ■ 

detection. 

( 

A conventional  digital  triplex  system  would  therefore  be  used  today  for  fail-  i 

operational  applications,  such  as  for  instance,  automatic  bad-weather  landing.  S 

If  it  can,  however,  be  proved  that  the  described  procedure  of  failure  self-detection 
operates  with  a failure  detection  probability  of  1 - 10"'*,  it  would  then  be 
possible  to  use  a duplex  system  with  failure  self-detection  (figure  5)  instead 
of  a state-of-the-art  triplex  system. 


ij 


FIGURE  5 

DUPLEX-SYSTEM  WITH  FAILURE-SELF-DETECTION 


PASS 
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The  failure  detection  probability  of  1 - 10  has  to  be  specified  for  fail- 
operational  uses  as  a probability  of  about  10"^/h  can  be  assumed  for  the  hardware 
failure  of  one  channel.  Such  a hardware  failure  is  catastrophic  if  it  is  not 
detected  by  the  failure  self-detection  . 

The  probability  of  catastrophic  failure  is  therefore  10  ^/h  on  the  basis  of  the 
above  mentioned  assumptions.  The  failure  self-detection  is  a promising  procedure 
for  the  next  aircraft  equipment  generation  since  the  hardware  amount  is  considerably 
lower  than  for  conventional  parallel  redundant  systems.  The  evidence  of  the  failure 
detection  necessary  for  the  certification  has  to  be  furnished  through  future  investi- 
gations. 
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SUMMARY 

An  overview  of  the  development  and  application  of  a circuit  analysis  technique  will  be  presented  in 
this  paper.  The  technique  is  based  on  an  aerospace  discovery  that  topological  criteria  exist  that  can  be 
used  to  recognize  unplanned  operational  modes  of  a circuit.  The  analysis  technique  involves  encoding 
circuitry  data  from  detailed  schematics  for  computer  processing.  The  computer  orocessing  produces 
simplified,  topological  network  trees  which  represent  the  system  circuitry.  The  network  trees  are  analyzed 
by  the  application  of  sneak  circuit  conditions.  The  results  obtained  from  a variety  of  complex  electrical 
systems  analyses  will  also  be  presented  as  positive  collaboration  for  this  circuitry  analysis  technique. 


BACKGROUND 

A sneak  circuit  is  a latent  path  or  condition  in  an  electrical /electronic  system  which  inhibits  a 
desired  condition  or  initiates  an  unintended  or  unwanted  action.  A sneak  circuit  is  not  caused  by 
component  failures,  but  is  a condition  that  has  been  inadvertently  designed  into  an  electrical/electronic 
system.  Sneak  circuits  often  exist  because  many  subsystem  designers  lack  the  overall  system  visibility 
required  to  electrically  interface  all  subsystems  properly.  Some  sneak  circuits  are  evidenced  as 
"glitches"  or  spurious  operation  modes  and  can  occur  in  mature,  thoroughly  tested  systems  after  long  use. 
Sometimes  sneaks  are  the  real  cause  of  problems  blamed  on  electromagnetic  interference  or  ground  "bugs". 
Sneak  Circuit  Analysis  should  not,  therefore,  be  confused  with  other  unrelated  analysis  techniques  such  as: 

0 Failure  Modes  and  Effects  Analysis 

0 Fault  Tree  Analysis 

0 Electromagnetic  Compatibility  Analysis 

0 Parametric  Analysis 

0 Reliability  Analysis 

0 Hardware  Oriented  Testing  and  Troubleshooting 

The  unpredictable  nature  of  sneak  circuit  conditions  prompted  the  National  Aeronautics  and  Space 
Administration  (NASA)  to  fund  an  investigation  to  determine  a method  of  identifying  sneak  circuit  conditions 
before  their  occurrence  could  pose  any  possible  threat  to  the  safety  of  Apollo,  Skvlab  or  Apollo-Soyuz 
Test  Program  (ASTP)  crew  members. 

The  Boeing  Aerospace  Company  began  such  a study  in  November  1967,  which  was  composed  of  a detailed 
review  of  historical  incidents  of  sneak  circuit  conditions  in  various  electrical  systems.  A sneak  circuit 
was  defined  for  this  study  as  "...  a designed-in  signal  or  current  oath  which  causes  an  unwanted  function 
to  occur  or  which  inhibits  a wanted  function."  The  definition  excluded  component  failures  and  electro- 
static, electromagnetic,  or  leakage  paths  as  causative  factors.  The  definition  also  excluded  improper 
system  performance  because  of  marginal  parametric  factors  or  slightly  out-of-tolerance  conditions. 

The  1967  historical  incident  investigation  resulted  in  two  significant  findings:  (1)  Sneak  circuits 

are  universal  in  complex  electrical  systems  and  their  analogs,  and  (2)  Topological  criteria  exist  which 
enable  recognition  of  all  planned  or  unplanned  operational  modes  within  a system.  Boeinn  developed  a 
computer-aided  analysis  technique  based  on  the  1967  findings  which  established  the  following  analytical 
goals:  (1)  The  simplification  and  reduction  of  electrical  system  detail  schematics  to  topological  network 

trees,  (2)  The  recognition  cr'  the  basic  topographical  patterns  inherent  in  all  circuitry,  and  (3)  The 
application  of  the  appropriate  clues  which  have  been  found  to  characterize  sneak  circuit  conditions. 

NETWORK  TREE  PRODUCTION 

The  first  major  consideration  that  must  be  satisfied  to  identify  sneak  circuit  conditions  is  to 
insure  that  the  data  being  used  for  the  analysis  represents  the  actual  "as-built"  circuitry  of  the  system. 
Functional  schematics,  integrated  schematics,  and  system  level  schematics  do  not  always  represent  the 
actual  constructed  hardware.  Detail  manufacturing  and  installation  schematics  must  be  used  because 
these  drawings  specify  exactly  what  was  built,  contingent  on  quality  control  checks,  tests,  and  inspections. 
However,  manufacturing  and  installation  schematics  rarely  show  complete  circuits.  The  schematics  are 
laid  out  to  facilitate  hookup  by  technicians  without  regard  to  circuit  or  segment  function.  As  a result, 
analysis  from  detail  schematics  is  extremely  difficult.  So  many  details  and  unapparent  continuities  exist 
in  these  drawings  that  an  analyst  becomes  entangled  and  lost  in  the  maze.  Yet,  these  schematics  are  the 
data  that  must  be  used  if  analytical  results  are  to  be  based  on  true  electrical  continuity.  The  first 
task  of  the  sneak  analyst  is,  therefore,  to  convert  this  detailed,  accurate  information  into  a form 
useable  for  analytical  work.  The  magnitude  of  data  manipulation  required  for  this  conversion  necessitates 
the  use  of  computer  automation. 

Automation  has  been  used  in  sneak  circuit  analysis  since  1970  as  the  basic  method  of  tree  production 
from  manufacturing-detail  data.  Computer  programs  have  been  developed  to  allow  encodinn  of  simplified 
continuities  in  discrete  "from-to"  segments  from  detail  schematics  and  wire  lists.  The  encoding  can  be 
accomplished  without  knowledge  of  circuit  function.  The  computer  connects  associated  points  into  paths 
and  collects  the  paths  into  node  sets.  The  node  sets  represent  interconnected  nodes  that  make  up  each 
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circuit.  Plotter  output  of  node  sets  and  other  reports  are  generated  by  the  computer  to  enable  the 
analyst  to  easily  sketch  accurate  topological  trees.  The  computer  renorts  also  provide  complete  indexing 
of  every  component  and  data  point  to  its  associated  tree.  This  feature  is  esoecially  useful  in  cross- 
indexing functionally  related  or  interdependent  trees,  in  incorporating  changes,  and  in  troubleshooting 
during  operational  support. 

TOPOLOGICAL  PATTERN  IDENTIFICATION 

Once  the  network  trees  have  been  produced,  the  next  task  of  the  analyst  is  to  identify  the  basic 
topological  patterns  that  appear  in  each  tree.  Five  basic  patterns  exist;  (1)  The  single  line  (no-node) 
topograph,  (2)  The  ground  dome,  (3)  The  power  dome,  (4)  The  combination  dome,  and  (5)  The  'H'  pattern. 
These  patterns  are  illustrated  in  Figure  1.  One  of  these  patterns  or  several  in  combination  will 
characterize  the  circuitry  shown  in  any  given  network  tree.  Although  at  first  glance  a given  circuit  may 
appear  more  complex  than  these  basic  patterns,  closer  inspection  reveals  that  the  circuit  is  actually 
composed  of  these  basic  patterns  in  combination.  As  the  sneak  circuit  analyst  examines  each  node  in  the 
network  tree,  he  must  identify  which  pattern  or  patterns  that  node  is  part  of  and  apply  the  basic  clues 
that  have  been  found  to  typify  sneak  circuits  involving  that  particular  pattern. 


CLUE  APPLICATION 


Associated  with  each  pattern  is  a list  of  clues  to  help  the  analyst  identify  sneak  circuit  conditions. 
These  clues  are  guestions  that  the  analyst  must  ask  about  the  circuitry  in  guestion.  As  an  example,  the 
single  line  topograph  (Figure  1),  would  have  clues  such  as; 

(1)  Is  switch  SI  open  when  load  LI  is  desired? 

(2)  Is  switch  SI  closed  when  load  LI  is  not  desired? 

(3)  Does  the  label  SI  reflect  the  true  function  of  LI? 

(4)  Et  Cetera. 

Obviously,  sneak  circuits  are  rarely  encountered  in  this  topograph  because  of  its  simplicity.  Of  course, 
this  is  an  elementary  example  and  is  given  primarily  as  the  default  case  which  covers  circuitry  not 
included  by  the  other  topographs. 

With  each  successive  topograph,  the  clue  list  becomes  longer  and  more  complicated.  The  clue  list 
for  the  'H'  pattern  includes  over  60  clues.  This  pattern,  because  of  its  complexity  is  associated  with 
more  sneak  circuits  than  any  of  the  previous  patterns.  Almost  half  of  the  critical  sneak  circuits 
identified  to  date  can  be  attributed  to  the  'H'  patterns.  Such  a design  configuration  should  be  avoided 
whenever  possible.  The  possibility  of  current  reversal  through  the  'H'  crossbar  is  the  most  comonly 
used  clue  associated  with  'H'  pattern  sneak  circuits  and  so  will  be  illustrated  in  the  example  below. 

THE  RELUCTANT  REDSTONE 


The  circuitry  presented  in  Figure  2 represents  part  of  the  ignition  control  circuitry  for  a 
Redstone  booster  used  during  the  Mercury  Program  in  the  early  1960's.  This  schematic  would  not  be  encoded 
for  sneak  circuit  analysis  because  it  is  not  considered  to  be  a "manufacturing  level"  schematic.  It  does 
serve  to  illustrate,  howeve',  the  difficulty  Involved  in  separating  a given  circuit  from  its  surrounding 
circuitry  and  simplifying  it  to  the  extent  necessary  for  an  effective  analysis.  As  can  be  seen  in  one  of 
the  network  trees  for  this  circuit  shown  in  Figure  3,  the  computer  removes  all  connectors,  terminal  blocks. 
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unnecessary  nodes,  splices,  and  any  other  extraneous  circuit  elements  that  do  not  affect  the  actual 
electrical  continuities  involved.  Only  active  circuit  elements  remain  in  the  comoleted  network  trees. 

When  the  network  tree  has  been  obtained,  a sneak  circuit  analyst  examines  each  node  in  sequence, 
determining  which  of  the  basic  topographs  the  node  is  a part  of  and  applying  the  clues  that  pertain  to 
those  topographs.  After  all  such  clues  have  been  applied,  all  sneak  circuits  present  in  that  network 
tree  will  have  been  identified.  In  the  Redstone  booster  case,  the  pertinent  clue  is  the  'H'  pattern  clue 
relating  to  reverse  currents  as  previously  discussed.  This  clue  would  be  stated  as: 

"Is  it  possible  for  current  to  flow  in  both  directions  through  the  'H'  cross  bar?" 

If  the  answer  to  this  question  is  yes,  a sneak  circuit  is  not  automatically  indicated  as  this  may 
have  been  the  design  intent.  The  analyst  must  determine  whether  or  not  this  is  a desired  condition  within 
all  foreseeable  circumstances.  In  the  case  of  the  Redstone  circuit,  the  condition  definitely  was  not 
desired.  The  design  intent  was  that  the  Cutoff  Conmand  relay  contacts  or  the  Pad  Abort  switch  should 
energize  the  Engine  Cutoff  coil  and  the  Abort  Indicator  coil.  When  the  Launch  Coimand  contacts  close,  they 
should  turn  on  the  Launch  indicator  lights  only.  The  possible  reverse  current  can  only  exist  when  the  ground  | 

below  the  indicator  lights  is  lost.  This  would  occur  if  the  Tail  Plug  umbilical  opened  before  the  Control  | 

umbilical  separated.  If  this  happened,  current  would  flow  through  the  Launch  Command  relay  contacts,  H 

the  Launch  indicator  lights,  through  the  suppression  diode  of  the  Abort  Indicator  coil  and  finally  through  ji 

the  Engine  Cutoff  coil  to  ground.  By  this  means,  the  Launch  Command  can  activate  the  Engine  Cutoff  coil  a 

if  the  Tall  Mug  umbilical  separates  before  the  Control  umbilical,  ^ 

'i 

ii 


FIGURE  2 

RELUCTANT  REDSTONE  SIMPLIFIED  SCHEMATIC 


Unlikely  as  this  event  may  appear,  it  did  occur  on  November  21,  1961.  After  more  than  50  sequentially 
successful  Redstone  booster  launches,  a Redstone  booster  with  a Mercury  capsule  on  it  was  to  be  launched. 
When  the  ignition  command  was  given,  the  booster  fired.  After  lifting  several  inches  from  the  pad,  the 
engine  inexplicably  cut  off.  The  booster  settled  back  on  the  pad.  The  Mercury  cansule  jettisoned  and 
deployed  its  parachutes.  Damage  was  slight,  but  a highly  explosive  rocket  with  no  means  of  control  sat 
on  the  pad  for  28  hours.  No  one  dared  to  approach  the  Redstone  booster  until  its  batteries  drained 
down  and  the  liquid  oxygen  evaporated.  Later  investigation  revealed  that  the  Tail  Plug  umbilican  had 
separated  29  milliseconds  prior  to  Control  umbilical  disconnect.  This  was  enough  'time  for  the  sneak 
circuit  shown  in  Figure  3 to  occur  and  abort  the  mission. 
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FIGURE  3 

REDSTONE  NETWORK  TREE 


SNEAK  CIRCUIT  ANALYSIS  REPORTS  PRODUCED 

Sneak  circuit  analysis  of  a system  produces  the  following  three  general  categories  of  outputs. 

(1)  Drawing  Error  Reports,  (2)  Design  Concern  Reports  and  (3)  Sneak  Circuit  Reports. 

DRAWING  ERROR  REPORTS  disclose  document  discrepancies  identified  primarily  during  the  data  encoding 
phase  of  the  Sneak  Circuit  Analysis  effort. 

DESIGN  CONCERN  REPORTS  describe  circuit  conditions  that  are  unnecessary  or  undesirable  but  which 
are  not  actual  sneak  circuits.  These  would  include  single-failure  points,  unsuppressed  inductive  loads, 
unnecessary  components,  and  Inadequate  redundancy  provisions.  A number  of  these  kinds  of  conditions 
usually  are  identified  whenever  an  analyst  examines  a circuit  at  the  level  of  detail  required  for  a formal 
Sneak  Circuit  Analysis. 

SNEAK  CIRCUIT  REPORTS  delineate  the  sneak  conditions  identified  during  the  analysis.  These  reports 
typically  fall  into  four  broad  categories: 

(1)  Sneak  Paths  which  allow  current  to  flow  along  an  unsuspected  path  or  in  an  unintended  direction. 

(21  Sneak  Timing  which  causes  functions  to  be  inhibited  or  to  occur  unexpectedly. 

(3)  Sneak  Labels  on  switches  or  Indicators  which  cause  incorrect  actions  to  be  taken  by  operators. 

(4)  Sneak  Indications  which  cause  ambiguous  or  false  displays. 

Sneak  Circuit  Analyses  have  been  performed  on  more  than  fifty  complex  electrical  systems.  These 
analyses  have  included  such  diverse  projects  as  Apollo,  Space  Shuttle  and  F-8  Digital  Fly-By-Wire  for  the 
National  Aeronautics  and  Space  Administration;  Pershing  Missile  and  F-4C  Flight  Control  System  for  the 
military;  the  N-Reactor  emergency  control  and  shutdown  systems  in  the  Nuclear  Power  field;  and  the 
Thistle  Field  A Platform  upending  and  placement  control  systems  for  the  oil  industry. 

Figure  4 illustrates  a typical  control  system  Sneak  Circuit  (identified  during  the  analysis  of  a 
Digital  Fly-By-Wire  Flight  Control  System).  Sneak  Circuit  Analysis  is  applicable  to  almost  any  complex 
electrical  system  involving  either  analog  or  digital  circuitry  or  both.  Software  Sneak  Analysis  employs 
an  analogous  method  developed  for  analyses  of  complex  software  systems.  Without  exception,  critical 
sneak  circuit  conditions  have  been  identified  in  every  system  investigated,  even  those  that  have  been 
thoroughly  analyzed  by  other  means,  comprehensively  tested,  simulated,  breadboarded,  and  operated  for 
a number  of  years.  Sneak  circuits  are  apparently  universal,  and  Sneak  Circuit  Analysis  can  and  does 
find  them. 
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DESCRIPTION 

The  correct  operation  of  each  of  the  primary  systems  illustrated  required  power  at  both  points  A and 
B.  The  backup  systems  require  power  at  the  points  labeled  C.  The  Channel  B power  feed  circuitry  shown 
above  is  typical  of  how  all  three  channels  should  have  been  wired.  The  wires  from  TB104  pins  G and  E of 
the  paddle  switches  to  Channel  A circuitry  and  Channel  C circuitry  respectively  were  inadvertently 
reversed.  This  wiring  error  would  result  in  the  loss  of  both  Channel  A primary  and  backup  systems  and 
the  Channel  C primary  system  through  a single  bus  failure  involving  Channel  A.  Due  to  the  voting  logic 
of  the  primary  system,  the  Channel  B primary  system  is  automatically  disabled  when  the  other  two  primary 
systems  are  lost.  A failure  involving  the  Channel  C bus  would  have  a similar  effect. 

A single  failure,  therefore,  involving  Bus  A will  disable  the  entire  primary  system  and  the  Channel  i 
backup  system.  A single  failure  involving  Bus  C will  disable  the  entire  primary  system  and  the  Channel  C 
backup  system. 


FIGURE  4 

SNEAK  PATH  IN  FLIGHT  CONTROL  SYSTEM  PRIME  POWER  DISTRIBUTION 
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SUMMARY 

The  techniques  discussed  in  this  paper  are  suitable  for  use  while  the  flight  control  system  is  per- 
forming its  normal  task.  Most  of  these  techniques  are  also  applicable  on  the  ground. 

Since  many  inputs  and  outputs  of  a digital  flight  control  system  are  analog  signals,  some  analog 
testing  capability  is  required.  The  basic  concepts  of  analog  testing  may  often  be  carried  into  digital 
testing. 

Monitoring  involves  comparison  of  the  performance  of  the  item  under  test  to  the  performance  of  a 
model.  If  the  model  is  dynamic,  it  is  a suitable  standard  of  performance  over  a wide  range  of  operating 
conditions;  if  the  model  is  fixed,  it  is  a suitable  standard  of  performance  only  under  a specific  set  of 
operating  conditions. 

Where  redundant  information  is  available  in  the  system,  the  system  itself  may  form  all  or  part  of 
the  model.  Computations  can  extract  model  Information  from  signals  that  contain  composite  data.  A 
single  accelerometer  displaced  from  the  center  of  gravity  can,  for  example,  be  used  to  verify  the  per- 
formance of  a rate  gyro  and  an  accelerometer  as  well. 

Setting  tolerances  on  signal  levels  for  good-bad  decisions  is  a difficult  problem  often  solved 
empirically.  Parameter  estimation  techniques  permit  detection  of  component  electrical  parameter 
shifts.  The  tolerances  used  in  good-bad  decisions  are  then  those  of  component  values  which  are  inde- 
pendent of  signal  amplitudes.  The  parameter  estimation  test  technique  also  provides  automatic  fault 
isolation. 

Stimulated  monitoring  is  possible  where  the  item  under  test  is  time  multiplexed  or  where  the  stimulus 
can  be  designed  to  have  negligible  effect  on  the  system  performance. 

Software  simulation  is  a useful  modeling  technique  for  digital  item  testing.  Where  stimulated  testing 
is  possible,  the  model  need  be  only  a set  of  input /output  patterns.  Dynamic  digital  models  which  monitor 
digital  items  on  line  are  complex  and  tend  to  require  large  amount  of  computer  time. 

Fixed  models,  such  as  parity  checkers  and  watch  dog  timers,  are  frequently  used  in  digital 
monitoring. 

Wrap-around  testing  is  a practical  preflight  and  post-flight  test  technique. 
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INTRODUCTION 

Built-in  test  capability  in  a digital  flight  control  system  can  reduce  support  costs,  reduce  turn- 
around time  and  improve  mission  success  probability.  Because  of  these  potential  benefits  BIT  (built- in- 
test) is  increasingly  incorporated  as  an  integral  feature  of  system  design.  Where  redundancy  exists, 
comparison  testing  of  corresponding  signals  or  parameters  of  the  redundant  elements  is  an  effective 
built-in-test  technique.  Since  comparison  of  redundant  modules  has  had  ample  exposure  in  the  fault- 
tolerant  literature,  this  paper  will  emphasize  other  techniques  for  in-flight  testing  of  digital  flight  con- 
trol systems.  The  emphasis  in  this  paper  is  upon  techniques  which  are  applicable  in  flight.  These 
same  techniques  are  usually  applicable  on  the  ground. 


DEFINITIONS 

For  the  purposes  of  this  paper,  testing  will  be  defined  as  a process  designed  solely  for  the  purpose 
of  determining  whether  an  item  is  functioning  or  is  capable  of  functioning  within  acceptable  limits.  The 
testing  process  may  include  one  or  more  elemental  tests  which  are  defined  as  the  process  of  determining 
whether  a measured  quantity  lies  within  a range  of  values. 

On-  line  testing  is  that  testing  which  is  carried  out  while  the  item  is  available  for  its  normal  function 
and  does  not  delay  or  disrupt  the  execution  of  that  normal  function. 

Monitoring  is  on-line  testing  which  does  not  involve  the  use  of  overt  test  stimuli.  Testing  which 
involves  Ihe  use  of  stimulus  superimposed  on  the  nominal  signal,  but  not  affecting  the  system  perfor- 
mance, may  also  be  included  in  the  monitoring  category. 

Comparison  testing  is  testing  which  involves  direct  compari.son  of  a parameter  or  signal  of  an  item 
to  a standard  like  parameter  or  signal. 

Reasonableness  testing  is  testing  which  can  detect  malfunction  of  the  equipment  under  test  only  when 
that  malfunction  cau.ses  the  signal  or  parameter  in  question  to  fall  outside  of  a range  of  values.  The  test 
is  inconclusive  for  signals  or  parameters  inside  the  "reasonable"  range  of  values. 
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In-line  testing  is  that  testing  which  can  be  performed  by  a unit  on  itself  without  reference  to  a 
redundant  item,  built- in-test  equipment,  or  external  test  equipment. 

Functional  testing  is  that  testing  which  establishes  whether  an  item  under  test  functions  within 
specified  performance  limits. 

Fault  isolation  testing  is  that  testing  which  establishes  the  physical  location  of  the  failure  within  a 
faulty  item. 

Dynamic  model  is  defined  as  one  whose  transfer  characteristics  is  a good  approximation  to  that  of 
the  unit  it  models  over  a wide  range  of  input  conditions. 

Fixed  model  is  one  which  is  a valid  representation  of  the  item  under  test  only  for  a specific  set  of 
fixed  input  values. 


IN-FLIGHT  MONITORING  TECHNIQUES 

Monitoring  utilizes  the  information  which  can  be  obtained  from  the  item  while  it  is  in  its  normal  use. 
Monitoring  does  not  introduce  any  significant  perturbations  into  the  behavior  of  the  item  in  question. 
Monitoring  may  require  the  use  of  test  equipment. 

Analog  Monitoring 

Since  most  sensors  used  by  digital  flight  control  systems  are  basically  analog  devices,  some  con- 
sideration of  analog  monitoring  techniques  is  necessary.  Where  the  analog  device,  circuitry  or  equip- 
ment is  redundant,  comparison  of  corresponding  signals  or  parameters  of  the  redundant  units  is  valid 
and  cost  effective.  Where  such  redundancy  does  not  exist,  one  or  more  of  the  following  techniques  may 
be  useful: 

• Dynamic  Model  - Input  and  CXitput  Accessible  --  When  both  input  and  output  of  an  analog 
device  are  available,  it  is  possible  to  model  the  unknown  device  and  perform  £ reasonable- 
ness test  by  comparing  the  results  obtained  from  the  model  with  the  actual  signal.  The 
model  may  be  a hardware  or  a software  model.  When  the  model  is  not  continuously  con- 
nected as  might  be  the  case  with  a software  model  or  a time-shared  hardware  model,  care 
must  be  taken  to  account  for  the  state  of  energy  storage  elements  within  the  flight  hard- 
ware at  connection  time.  Figure  1 shows  the  dynamic  model  concept  applied  to  two  simple 
analog  signal  processing  elements.  In  Figure  la  an  inverse  model  is  used  since  that  model 
is  differentiating  and  avoids  the  initial  condition  problem  of  an  integrating  model. 

In  Figure  lb  a forward  model  is  used  because  the  flight  hardware  is  basically  a high  pass. 

Here  the  Initial  condition  problem  Is  not  severe  because  the  time  constant  T3  is  relatively 
short. 

In  both  cases  of  Figure  1,  limits  are  established  in  terms  of  a percentage  of  value  rather 
than  a fixed  magnitude  tolerance  band.  If  the  signal  falls  outside  of  the  "GO  BAND"  the 
output  logic  level  will  go  to  a logic  zero.  An  integrating  block  is  shown  on  the  output  logic 
to  eliminate  any  noise  pulses  which  may  otherwise  produce  false  failure  indications. 

The  BIT  models  of  Figure  1 may  be  implemented  either  in  hardware  or  software.  Where 
general-purpose  computing  capability  is  available  during  the  BIT  time,  software  models 
may  be  used.  Hardware  models  may  utilize  either  analog  circuitry  or  hardware  digital 
filters.  The  absolute  value  computations  and  the  comparisons  are  conveniently  done  in  the 
computer  rather  than  in  hardware. 

• Dynamic  Model  - Input  Inaccessible  --  When  the  input  is  inaccessible  as,  for  example,  in 
inertial  sensors,  an  alternate  source  of  information  is  needed  to  permit  dynamic  moni- 
toring. This  information  must  ultimately  come  from  an  independent  sensor.  In  some 
cases,  a single  independent  BIT  sensor  can  be  used  to  test  more  than  one  flight  sensor. 

This  is  the  case  in  Figure  2 where  an  inner  stabilization  loop  uses  a rate  gyro  and  an 
accelerometer.  Since  an  accelerometer  mounted  in  a position  which  is  displaced  from  the 
center  of  gravity  along  an  axis  in  the  plane  of  rotation  will  sense  acceleration  which  has 
both  rotational  and  translational  components,  it  will  provide  information  adequate  to  verify 
both  the  rotational  rate  gyro  and  accelerometer  which  are  sensitive  to  the  same  inputs. 

In  Figure  2 the  primary  flight  sensors  are  used  to  compute  the  quantity 

I CiK2S|  ( Kj 

^ Pi  * TS+1  P ^ 1tS+1 

where  q is  the  rotational  rate  and  a is  the  linear  acceleration.  Two  other  outputs  are  com- 
puted, using  an  additional  accelerometer,  which  are  essentially  duplicates  of  the  primary 
output.  Since  each  of  the  three  outputs  depends  upon  only  two  sensors,  if  one  output  does 
not  agree  with  the  other  two  a fault  is  detected  and  may  in  fact  be  Isolated  to  a specific 
sensor  as  shown  in  Table  1. 
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Figure  lA.  Inverse  Model  Input  and  Output  Accessible 


UNIT  UNDER  TEST 


NOTATION  CONVENTION 


Figure  IB.  Forward  Model  Input  and  Output  Accessible 
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Figure  2.  Dynamic  Modeling  - Input  Inaccessible 


Table  1.  Fault  Isolation  Properties  of  Figure  2 

Output  States;  0 = Minority 
1 = Majority 


Primary 

Derived  #1 

Derived  #2 

Fault 

1 

1 

0 

Fore  Accelerometer 

1 

0 

1 

Rate  Gyro 

0 

1 

1 

Aft  Accelerometer 

The  comparison  of  the  three  outputs  may  be  carried  out  as  shown  in  Figure  3.  Here 
again  the  tolerance  band  for  reasonableness  testing  is  established  as  a percentage  of  one 
of  the  signals  being  compared.  Logic  circuitry  identifies  the  input  which  is  in  the 
minority.  The  comparison  and  logic  may  be  implemented  in  hardware,  but  they  are 
conveniently  done  in  software  if  computer  capability  is  available. 

• Parameter  Estimation  --  The  dynamic  model  techniques  use  models  which  have  fixed 
parameters,  or  at  most  programmable  parameters.  With  these  models,  a fault  is 
detected  when  signals  which  should  be  equal  are  found  to  deviate  by  more  than  an  estab- 
lished tolerance  level.  The  establishment  of  a suitable  tolerance  that  will  avoid  nuisance 
trips  but  yet  detect  failures  early  is  a difficult  problem  which  can  only  be  solved  in  the 
context  of  the  particular  application  at  hand.  There  are,  however,  definite  tolerances  on 
the  electrical  and  mechanical  characteristics  of  individual  components  of  a system.  Since 
the  parameters  of  a transfer  relationship  are  known  functions  of  the  component  values,  it 
is  possible  to  establish  specific  limits  for  the  transfer  relationship  parameters.  Wben  the 
transfer  relationship  is  specified  in  factored  form,  each  parameter  is  a function  of  only  a 
few  components  in  the  system.  These  functions  are  context  free  and  limits  on  system 
transfer  parameters  which  represent  the  extremes  obtainable  with  in-tolerance  components 
can  be  readily  estimated. 

Parameter  estimation  techniques  can  be  used  to  compute  the  present  value  of  the  transfer 
parameters  of  an  analog  system  using  the  normal  signals  of  the  system.  One  technique 
of  parameter  estimation  is  shown  in  Figure  4.  Here  the  parameter  of  a software  model 
of  the  unit  under  test  is  adjusted  by  a [jarameter  adjustment  algorithm  which  minimizes 
the  mean  squared  error  of  the  output  of  the  model  as  compared  to  the  measured  output  of 
the  unit  under  test.  The  use  of  this  technique  requires  considerable  computation  capa- 
bility, but  the  computing  time  may  be  acceptable  when  operated  in  a tracking  mode;  that  is, 
using  the  last  estimate  of  the  values  of  the  parameters  as  the  starting  point  for  the  next 
estimation  in  the  next  BIT  interval. 


Parameter  estimation  can  he  used  for  fault  isolation  using  the  parameter  values  as  a fault 
signature.  Figure  5 shows  a simple  linear  circuit  which  may  be  fault  isolated  by  use  of 
Table  2.  In  the  construction  of  Table  2,  if  a parameter  is  more  than  20  percent  higher 
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R2  R4 


a)  CIRCUIT  DIAGRAM 


(p  + Aj^Kp  + Ag)  (p  + AjKp  + A^) 
OR: 

(p  + A^Hp  + Ag)  (p^  + pBj  + 64) 
c)  TRANSFER  FUNCTION 


Table  2.  Analog  Fault  Dictionary 


K 

A1 

A2 

A3 

A4 

B3 

B4 

Replace 

2 

2 

2 

1 

1 

NA 

NA 

Cl 

2 

2 

1 

1 

1 

NA 

NA 

U1 

2 

2 

0 

1 

1 

NA 

NA 

R1 

2 

1 

2 

1 

1 

NA 

NA 

Cl 

2 

1 

1 

2 

1 

NA 

NA 

U2 

2 

1 

1 

1 

1 

NA 

NA 

R1 

2 

1 

1 

NA 

NA 

2 

2 

R3 

2 

1 

1 

0 

2 

NA 

NA 

R3 

2 

1 

1 

0 

1 

NA 

NA 

R4 

1 

2 

2 

1 

1 

NA 

NA 

R2 

1 

I 

2 

1 

1 

NA 

NA 

R2 

1 

1 

1 

1 

2 

NA 

NA 

C2 

1 

1 

1 

1 

1 

NA 

NA 

No  Fault 

1 

1 

1 

1 

0 

NA 

NA 

C2 

1 

1 

0 

1 

1 

NA 

NA 

R2 

0 

1 

1 

2 

1 

NA 

NA 

R4 

0 

1 

1 

2 

0 

NA 

NA 

R3 

0 

1 

1 

1 

1 

NA 

NA 

R1 

0 

1 

1 

0 

1 

NA 

NA 

U2 

0 

1 

0 

1 

1 

NA 

NA 

Cl 

0 

0 

1 

1 

1 

NA 

NA 

U1 

0 

0 

0 

1 

1 

NA 

NA 

U1 

Figure  5.  Typical  Linear  Circuit 


A 


than  nominal  it  is  given  the  weight  2.  If  less  than  80  percent  of  nominal  value  it  is  given 
the  weight  0.  If  within  ±20  percent  of  nominal  it  is  given  the  weight  1.  In  this  case  the 
faulty  component  can  be  located  with  no  ambiguity. 

• Stimulated  Monitoring  — In  many  flight  systems  the  signals  present  in  an  item  may  be 
essentially  quiescent  for  long  periods  of  time  with  only  short  bursts  of  significant  activity. 
Monitoring  techniques  which  depend  upon  the  use  of  the  signals  normally  existing  in  the 
item  under  test  may  produce  inconclusive  results  during  these  periods  of  inactivity.  Since 
an  inactive  period  may  be  the  most  appropriate  time  to  test,  stimulated  monitoring  may 
be  considered. 

In  stimulated  monitoring,  a small  energy  signal,  usually  of  zero  mean,  is  superimposed 
upon  the  normal  signal.  The  signal  is  designed  to  produce  negligible  effect  upon  the  per- 
formance of  the  system.  Typical  signals  are  high-frequency  or  d-c  tracer  signals,  where 
appropriate,  or  doublet  pulses  to  introduce  cancelling  transients.  When  manual  involve- 
ment is  possible,  an  intentional  maneuver  can  Introduce  the  signal  activity  need. 

All  of  the  dynamic  model  testing  techniques  are  suitable  for  use  when  a stimulus  signal  is 
present.  The  use  of  a stimulus  makes  it  possible  to  use  simpler  fixed  models  as  well. 

• Fixed  Models  — A fixed  model  is  simply  a nominal  fixed  value  with  fixed  limits  against 
which  the  signal  obtained  from  the  unit  is  compared  when  all  assumed  operating  conditions 
are  fulfilled.  The  fixed  model  may  be  implemented  in  hardware  or  software.  The  use  of 
a fixed  model  for  monitoring  is  illustrated  in  Figure  6.  In  this  case  a low-energy  signal 
is  within  the  pass  band  of  the  unit  under  test  but  is  rejected  by  the  rest  of  the  system. 
Again  the  comparison  may  be  hardware  or  software.  This  test  may  be  sensitive  only  to  a 
subset  of  the  possible  failure  modes  of  the  item  being  tested. 


Sometimes  the  monitor  may  utilize  an  attribute  of  the  signal  which  is  not  used  to  convey 
information  in  the  normal  use  of  the  signal.  An  example  of  this  is  shown  in  Figure  7.  The 
purpose  of  the  monitoring  in  this  case  is  to  detect  inter mittants.  The  signal  is  a modulated 
carrier.  Transients  are  detected  by  a pair  of  counters.  One  counter  counts  the  transitions 
of  the  basic  frequency  source,  the  other  counts  the  transitions  of  the  monitored  signal.  If 
the  counters  do  not  agree  over  a given  time  period,  a failure  has  occurred. 

Digital  Monitoring 


Special-purpose  digital  hardware  can  frequently  be  described  by  a reasonably  tractable  fixed  input- 
output  relationship.  WTiere  this  is  the  case,  most  of  the  monitoring  techniques  discussed  under  analog 
monitoring  have  their  counterparts  for  monitoring  digital  items.  This  is  true,  for  example,  where 
hardware  digital  filters  are  used  to  implement  signal  processing  functions.  The  same  reasoning  applies 
to  software  modules  which  have  fixed  input-output  relationships  such  as  an  integration  routine.  Software 
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Figure  6.  Fixed  Limit  Stimulated  Monitoring 


REST  OF  SYSTEM 


Figure  7.  Intermittent  Detection  Fixed  Model  Monitoring 


of  this  type  can  be  monitored  by  appropriate  modifications  of  the  techniques  discussed  under  analog 
monitoring.  The  discussions  of  this  subsection  will  be,  therefore,  restricted  to  those  techniques  which 
are  peculiarly  suitable  to  testing  digital  items. 

Dynamic  Modeling.  Digital  hardware  can,  of  course,  be  modeled  by  duplication.  This  provides 
hardware  redundancy  and  can  be  used  to  improve  the  system  reliability.  Lower-cost  alternatives  are 
time- shared  reconfigurable  hardware  and  software  modeling.  Since  time- shared  models  and  software 
models  (that  is,  part-time  models)  do  not  usually  experience  the  full  history  of  the  item  under  test  they 
must  have  some  provision  for  determining  the  internal  states  of  the  item  under  test.  In  the  case  of  a 
general-purpose  computer  the  number  of  internal  states  is  extremely  large. 

To  avoid  undue  complexity  and  difficult  state  initialization,  digital  systems  are  usually  modeled  as  a 
collection  of  smaller  models.  The  test  comjiarison  spans  only  the  hardware  or  software  represented  by  a 
single  model.  Dynamic  models  are  used  only  where  the  number  of  states  is  small  and  state  information 
is  readily  available.  This  piecewise  modeling  has  the  advantage  of  inherent  fault  localization.  Where  the 
number  of  Internal  states  Increases,  dynamic  modeling  becomes  less  useful. 

A possible  use  of  software  dynamic  modeling  is  shown  in  Figure  8.  Here  a test  processor  contains  a 
software  simulation  of  the  item  being  tested.  The  model  must  permit  unknown  states  (ternary  simulator). 
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a)  DIGITAL  DYNAMIC  MONITOR  fa)  MAIN  TEST  STEPS 


Figure  8.  A Digital  Dynamic  Monitor 


The  test  processor  first  acquires  a number  of  sample's  of  the  inputs  and  outputs  of  the  item.  The  inputs 
are  then  applied  sequentially  to  the  software  model  until  all  of  the  states  have  been  established.  Obvi- 
ously, any  information  about  internal  states  gleaned  from  the  outputs  of  the  item  is  also  used. 

After  the  internal  states  have  been  established,  the  remaining  inputs  are  applied  to  the  model  and  the 
simulated  outputs  compared  to  the  corresponding  output  acquired  from  the  item  under  test.  If  a mismatch 
occurs  an  error  has  been  detected. 

The  test  processor  of  Figure  8 can  be  slow  since  it  operates  on  stored  data,  and  it  may  be  time 
shared  among  several  test  problems. 

Echo  Tests.  Good  digital  system  and  software  design  practice  which  has  evolved  to  provide  error- 
free  communication  between  the  central  processor  and  asynchronous  peripherals  inlierently  includes  test 
provisions.  These  same  provisions  with  some  expansion  can  be  used  for  BIT.  One  general  class  of  tests 
used  to  verify  correct  communication  may  be  termed  echo  tests. 

A typical  echo  test  is  that  used  in  magnetic  tape  units.  In  this  case  a read  head  reads  from  the  tape 
the  data  which  was  just  written  by  the  write  head.  Simple  comparison  of  the  read  and  write  data  verifies 
correct  operation. 

Another  echo  test  is  the  requirement  that  an  addressed  peripheral  respond  with  a "ready”  signal 
before  data  is  sent  by  the  processor. 

Expansion  of  the  echo  test  to  more  completeiy  verify  a digital  function  may  be  performed  by  the 
hardware  shown  in  Figure  9.  While  this  requires  some  additional  hardware,  the  hardware  is  simple  and 
inexpensive  and  its  presence  permits  complete  checkout  of  the  selection  of  one  of  n analog  dsta  sources 
for  the  A/D  converter.  The  use  of  a separate  data  buffer  and  address  decoder  permits  verification  that 
only  one  of  the  2"  switches  is  commanded  "on"  and  that  the  rest  are  commanded  "off." 

Stimulated  Monitoring.  Many  digital  applications  involve  time  multiplexing  of  some  part  of  the  sys- 
tem such  as  a data  bus  or  processing  unit.  In  these  cases  a time  slot  is  usually  available  for  test 
activity.  Other  elements  of  the  system  have  inactive  periods  during  which  testing  can  be  performed. 

In  either  case  it  is  usually  permissible  to  inject  stimuli  or  otherwise  exercise  the  item  during  the 
test  period.  The  test  procedure  must  be  such  that  no  information  is  lost  as  a result  of  changing  the  state 
of  any  memory  elements  in  the  item  under  test. 

A stimulated  monitoring  technique  is  illustrated  in  Figure  10.  The  test  processor  uses  a set  of  "test 
vectors"  generated  off-line  and  verified  by  off-line  simulation  to  detect  a high  percentage  of  the  possible 
faults  of  the  unit  under  test.  The  test  processor  applies  the  input  vectors  in  the  prescribed  order  and 
compares  the  actual  response  vectors  to  the  stored  "good"  responses.  The  tost  vectors  provide  an 
initializing  sequence  that  bring  the  internal  states  to  a known  condition.  Tests  of  this  type  may  be  per- 
formed by  special-purpose  hardware. 

The  test  vector  technique  can  also  provide  fault  isolation  to  a small  ambiguity  region.  The  set  of 
test  vectors  can  be  divided  into  groups  for  interleaving  with  the  normal  activity  of  the  unit  under  test, 
provided  an  initializing  sequence  is  provided  for  each  restart  point. 

Fixed  Models.  The  use  of  fixed  models  is  one  of  the  most  prevalent  techniques  used  in  digital  moni- 
toring^ Parity  checking  may  be  considered  as  fixed  monitoring.  In  the  parity  case  a signal  attribute  not 
used  to  perform  any  flight  control  function  is  utilized  to  monitor  the  perforn\ancc  of  the  item  under  test. 
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Figure  9.  Typical  Echo  Check  for  BIT 


a)  STIMULATED  MONITORING 


W MAIN  TEST  STEPS 


Figure  10.  A Digital  Stimulated  Monitor 


The  watch  dog  timer  is  another  example  of  fixed  modeling.  Properly  functioning  software  in  a 
properly  functioning  processor  will  reset  the  timer  at  fixed  time  intervals  within  a specified  tolerance. 

An  early  or  late  reset  will  be  interpreted  as  evidence  of  a malfunction. 

Memory  sum  check  involves  a fixed  model  which  is  simply  the  sum  of  all  the  words  in  a section  of 
memory  when  interpreted  as  binary  numbers. 

Sample  problems  interleaved  with  the  operational  program  are  a form  of  fixed  model  monitoring. 
Figure  11  shows  a typical  sample  problem  monitor  used  for  digital  flight  control  monitoring.  The  analog 
signal  generator  can  be  a very  simple  device  since  the  only  requirement  is  that  the  fundamental  frequency 
be  on  one  of  the  slopes  of  the  notch.  The  analog  filter  must  be  stable  in  resjxrnse  shape  and  frequency. 
This  monitor  will  check  most  of  the  instructions  of  the  flight  control  processor  as  well  as  the  arithmetic 
section,  memory,  I/O  and  the  A/D  and  D/A  converters. 

Self-Checking  Circuits.  A self-checking  circuit  utilizes  an  error-detecting  code.  In  fault-free 
operation,  the  output  is  always  an  acceptable  code  word.  A detectable  error  will  produce  a non-code 
word.  Self-checking  circuits  include  checkers  which  will  indicate  the  presence  of  non-code  words. 

While  self-checking  will  inevitably  increase  circuit  complexity  it  need  not  result  in  a proportionate 
increase  in  failure  rate,  size  or  cost. 

Implementation  of  self-checking  at  the  Integrated  circuit  level  is  very  appealing.  Many  ICs  are  pin 
limited  rather  than  com|)lexity  limited.  It  may  be  possible  to  obtain  as  many  or  nearly  as  many  self- 
checking logic  functions  in  an  1C  package  as  arc  now  provided  in  non-self-chccklng  logic.  Since  the 
failure  rate  (and  cost)  of  ICs  seems  to  depend  more  u|x>n  the  number  of  pins  than  upon  the  Internal  com- 
plexity, the  penalty  |>aid  for  self-checking  inay  he  quite  acceptable.  Self-checking  at  the  IC  level  has  the 
added  advantage  of  immediate  fault  isolation  to  the  1C  level. 
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Figure  11.  Sample  Problem  Monitoring 


Unfortunately  the  state-of-the-art  of  self-checking  circuits  has  not  progressed  to  the  point  of  wide- 
spread applicability.  Simple  fault  detecting  codes  exist  which  are  code  preserving  under  arithmetic  ^ 

operations  (exclusive  OR),  but  no  simple  fault- detecting  codes  are  code  preserving  under  the  logical  1, 

operations  (AND/OR).  The  work  to  date  has  generally  used  redundancy  and  comparison  for  testing 

these  logic  functions.  Where  the  logic  structure  can  be  arranged  to  do  so  economically,  arithmetic  logic  ,i/iv 

units  which  have  an  arithmetic  mode  can  be  used  to  implement  AND/OR  logic.  These  units  are  then  ^ 

placed  in  the  arithmetic  mode  for  self-checking.  ^ 

Digital  Memory  Monitoring 

Memory  monitoring  is  treated  as  a separate  topic  because  of  the  complexity  and  criticality  of  ^ 

memory  systems.  Read /write  memories  are  a particular  problem  since  there  is  usually  no  absolute 
way  to  determine  from  alternate  sources  the  correct  value  of  stored  data. 

Complete  memory  redundancy  at  two,  three,  or  higher  levels  is  used  for  fault  tolerance  in  critical 
applications.  Where  complete  redundancy  is  not  practical,  partial  redundancy  is  often  used  in  the  form 
of  error  detecting  or  detecting /correcting  codes.  Where  the  memory  contents  are  fixed  the  memory  sum 
check  (a  minimal  redundancy)  is  often  used.  In  semiconductor  memories  the  effectiveness  of  error 
detection  is  strongly  affected  by  the  physical  partitionir.g  of  the  memory  system. 

Single  Bit  per  Word  Parity.  A single  bit  appended  to  the  word  is  set  to  zero  or  one  as  required  to 
make  the  number  of  ones  even  (odd).  An  error  in  a single  bit  position  will  cause  the  parity  to  be  incor- 
rect. Single  failures  which  cause  simultaneous  errors  in  more  than  one  bit  position  may  not  be  detee  ( 

For  semiconductor  memories,  organizations  in  which  each  bit  is  produced  by  a separate  chip  are  rr 
suited  to  single-bit-per-word  parity. 

VVliere  all  bits  of  the  word,  including  the  parity  bit,  are  produced  by  a single  semiconductor  chip, 
a .single  bit  failure  in  address  decoding  will  cause  half  of  the  words  read  to  be  incorrect.  The  parity  bit 
in  this  case  will  not  detect  any  of  the  errors.  If  the  parity  bit  has  a separate  addressing  structure. 

'lO  percent  of  the  data  errors  caused  by  a single  address  bit  failure  will  be  detected. 

Wide  I’arity.  A common  parity  scheme  provides  one  parity  bit  for  each  byte  in  the  word.  This 
..  fn<  will  detect  all  single  bit  errors  per  byte.  If  all  bits  of  a byte,  including  parity  bit,  ar'  on  a 
> , . ' hip,  no  addressing  failures  will  be  detected.  If  the  parity  bits  have  a separate  addressing  struc- 

>0  -.r,  rcent  of  the  byte  addressing  failures  will  be  detected. 

Aide  I’arity.  In  this  scheme,  intended  primarily  for  semiconductor  memories,  parity  is  com- 
' - ’ f*  vToijp.s  formed  by  taking  one  bit  from  the  corresponding  bit  position  in  each  chip  forming 
word.  There  are  as  many  groups  as  bits  per  chip.  Figure  12  Illustrates  chip-wide  |>arity 
• ■ 'J  formed  from  memory  chips  four  bits  wide. 

• ifi.-..,  failure  of  the  address  decoding  on  a single  chip  will  cause  one  group  of  four  bits 
. • thi.'»  failure  can  cause  at  most  one  bit  of  a parity  grouping  to  change,  any  data 
I'  !.  .xiing  failure  will  be  detected.  When  the  chips  are  one  bit  wide  the  chip-wide 
■ ,1  >:ingle  [larity  bit  per  word,  but  any  failure  of  a single  chip  that  causes  a data 


)i  F.rrrir  correcting  codes  require  greater  redundancy  than  error-detecting 
of  memory  size.  For  a 16-blt  word,  five  Hamming  iiarity  bits  are 
■ ■ Mon  or  single-error  correction  and  six  bits  are  required  if  double-error 
"■'n  Is  desired. 
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Figure  12.  Example  of  Chip- Wide  Parity  Checker  Architecture 


Memory  Sum  Check.  A common  memory  test  technique  is  to  form  the  correct  sum  of  all  the  num- 
bers TKaFshouTHTF^tored  in  a block  of  memory.  This  check  sum  is  also  stored  (preferably  in  another 
block)  for  later  use.  When  blocks  are  read  out  of  memory  or  during  a self-test  cycle  the  denomination 
of  all  words  in  the  block  is  computed  and  compared  against  the  check  sum  word. 

This  technique  is  most  applicable  to  read-only  memories  and  mass  storage  memories.  It  can  be 
applied,  at  considerable  overhead  to  directly  executing  program  memories. 

Check  sum  provides  poor  coverage  for  the  most  signu leant  bit  position  unless  double- precision 
addition  or  end-around  carry  is  used. 

Where  the  carry  digit  is  retained,  sum  check  will  detect  100  percent  of  all  single-column  data  errors. 
For  address  errors  (single  address  bit  failed)  the  probability  of  data  error  detection  is  less  than  100  per- 
cent. Where  the  address  error  affects  only  a single  column,  the  probability  of  data  error  detection  is 
only  80  percent  for  a 16-word  group,  90  percent  for  a 64- word  group. 


PREFLIGHT  AND  POST-FLIGHT  TEST  TECHNIQUES 

Most  of  the  test  techniques  discussed  as  in-flight  monitoring  techniques  are  also  applicable  to  post- 
flight  and  preflight  testing.  There  is,  however,  in  post-flight  and  preflight  testing  greater  freedom  to 
stimulate  the  item  under  test.  Greater  periods  of  uninterrupted  time  are  available  for  post-flight  and 
preflight  testing.  The  use  of  computing  capability  of  other  on-board  systems  may  be  feasible  ("global 
BITE'O. 

Preflight  and  post-flight  test  objectives  place  greater  emphasis  on  precision  and  fault  isolation 
than  is  necessary  for  in-flight  monitoring. 

IWap-Around  Testing 

A digital  flight  control  system  typically  provides  for  analog,  digital  and  discrete  signal  inputs  and 
for  similar  outputs.  For  post-flight  and  preflight  testing  it  is  feasible  to  connect  outputs  to  inputs  and 
perform  a complete  end-to-end  test  of  the  complete  digital  flight  control  system  with  the  exception  of 
parts  of  some  sensors, 

A complete  end-to-end  test  of  a digital  flight  control  system  would  begin  with  a self-test  program 
which  verifies  the  operation  of  the  instruction  set  of  the  central  processor.  Testing  would  progress  out- 
ward from  this  point,  gradually  adding  elements  of  the  system  until  finally  transmission  of  information 
around  the  entire  loop  from  processor  to  output  to  input  to  processor  was  verified.  This  process  is 
illustrated  in  Figure  13.  A carry-on  operator  control  panel  may  be  utilized  if  one  is  not  an  inherent  part 
of  the  flight  equipment.  Embedded  in  the  gross  flow  chart  of  Figure  13  arc  most  of  the  test  techniques 
discussed  as  suitable  for  in-flight  monitoring. 
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Figure  13.  Wraparound  Test 
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CONCLUSION 

The  techniques  useful  for  in-fUght  monitoring  of  digital  flight  control  systems  can  be  categorized  as 
dynamic  model  or  fixed  model  techniques.  Built-in-test  equipment  utilizing  these  techniques  can 
provide  for  in-flight  and  on-board  testing  with  minimal  carry-on  equipment  and  no  roll-up  flight  line 
equipment. 
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'Pre-Flight  Dynamic  Checkout* 
D,R.  Towill 

(U.W.I.S.T. , Cardiff.  U.K.) 


Summary  The  advent  of  Fast  Fourier  Transform  algorithms  coupled  to  the 
increasing  availability  of  data  processing  facilities  has  resulted  in 
'transfer  function  testing'  increasing  enormously  in  popularity.  System 
impulse  response,  step  response,  or  frequency  response  are  all  satisfactory 
dynamic  signatures  related  to  the  transfer  function  of  the  system  under  test. 
As  discussed  in  the  paper,  it  is  sufficient  for  many  pre-flight  checkout 
applications  to  estimate  the  signature  at  just  a few  carefully  selected  data 
points,  thus  considerably  reducing  test  time  and  computational  capacity 
required.  This  in  turn  permits  the  use  of  special  purpose  hardware  such  as 
Fourier  Response  Analysers  when  justified  on  cost-benefit  or  logistic  grounds 
as  an  alternative  to  spectral  analysis  methods.  The  paper  provides  the  basic 
ground  rules  for  selecting  system  test  stimuli  and  test  features  for  both 
manual  and  automatic  testing. 
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1.  Introduction 

Very  many  aircraft  and  avionic  systems  are  required  to  suitably  respond  to  some  time  varying  input 
('the  message'),  and  to  suitably  reject  other  time  varying  inputs  ('disturbances')  in  order  to  properly 
perform  the  operational  role  for  which  they  are  intended.  For  example  in  an  automatic  terrain  following 
control  system  (ATFCS) , the  terrain  profile  plus  set  clearance  represents  the  'message',  and  the 
'disturbances'  will  arise  from  such  sources  as  sensor  noise,  internally  generated  noise  due  to  discrete 
signals  in  the  command  computer,  and  wind  gusts  tending  to  deflect  the  aircraft  off  course.  The  total 
ATFCS  is  designed  to  adequately  discriminate  between  the  'message'  and  'disturbances'  so  as  to  achieve 
the  desired  level  of  mission  effectiveness  which  in  this  case  is  the  best  balance  between  the  probabilities 
of  'terrain  clobber'  and  'mission  abort'. 

In  order  to  achieve  an  adequate  design  for  the  ATFCS^  the  system  designer  must  represent  both  message 
and  disturbances  by  realistic  time  varying  signals.  It  is  intuitively  obvious  (and  substantiated  by 
many  company  confidential  system  simulation  studies)  chat  dynamic  tests  are  needed  to  rapidly  establish 
the  operational  status  of  the  system,  and  it  is  the  purpose  of  this  paper  to  review  currently  used  tech- 
niques suitable  for  pre-flight  dynamic  testing.  Although  there  is  a swing  towards  conTputer  controlled 
automatic  test  equipment  (ATE)  implementation  of  dynamic  testing  because  of  the  advantage  of  speed  at 
which  test  data  becomes  available,  there  are  plenty  of  situations  in  which  it  is  acceptable  to  have  dynamic 
testing  undertaken  either  in  the  manual  or  the  built-in-test  mode  of  operation.  With  one  exception,  all 
examples  given  in  this  paper  are  for  'open-loop'  testing  in  that  the  aircraft  is  not  part  of  the  test  loop. 
The  system  under  test  (SUT)  may,  of  course,  be  a complex  multi-loop  feedback  system  which  would  naturally 
be  tested  with  its  own  feedback  paths  closed.  In  breaking  down  such  a complex  system  as  the  ATFCS  for 
pre-flight  testing,  the  dynamic  test  requirements  must  similarly  be  partitioned  between  the  various  SUT's 
such  as  compensation  units,  amplifiers,  mixers,  and  servoactuators  in  accordance  with  the  permissible 
transient  errors  for  each  unit.  All  SUT's  used  as  practical  examples  in  the  text  represent  analogue 
hardware,  but  the  advent  of  digital  autopilots  should  not  invalidate  the  principles  of  dynamic  testing 
for  pre-flight  checkout  since  preliminary  studies  have  already  indicated  that  hybrid  systems  may  be  tested 
using  pseudo-noise  signals  and  cross-correlation. (1) 

2.  What  is  Dynamic  Testing? 

The  dynamic  response  of  the  system  under  test  is  defined  as  the  behaviour  of  the  system  when 
stimulated  by  a time  varying  input  such  as  the  unit  step  or  one  of  the  many  alternative  signals  which  will 
be  considered  in  a later  section.  Consequently , a dynamic  test  is  any  test  which  yields  information  on 
the  dynamic  response  of  the  SUT,  even  if  the  data  yielded  does  not  completely  describe  the  dynamic 
behaviour  of  the  system.  Thus  if  only  the  final  value  of  the  response  of  the  SUT  is  measured,  then  the 
test  would  be  classified  as  a static  test  only,  whereas  if  the  behaviour  of  the  SUT  is  sampled  at  various 
times  during  the  transient  response,  then  the  test  would  be  classified  as  a dynamic  test. 

Fig.l  shows  the  step  response  of  an  autostabiliser  unit,  such  as  might  form  part  of  the  ATFCS  as 
recorded  on  an  oscilloscope.  Superimposed  is  a checkout  'mask'  which  the  test  technician  uses  to 
categorise  the  system  into  'healthy'  or  'sick'  status  according  to  whether  or  not  the  response  crosses  the 
boundary.  In  automatic  tests,  the  response  of  the  autostabiliser  is  sampled  at  a few  discrete  points  in 
time, (2)  and  a judgement  made  by  comparison  with  a set  of  checkout  gates,  or  by  reference  to  a decision 
surface,  using,  for  example,  the  nearest  neighbour  rule.  The  choice  of  test  features  (in  this  example 
the  features  are  the  delay  times  at  which  the  step  response  is  sampled)  is  crucial  in  arriving  at  a high 
level  of  correct  classification.  In  the  language  of  pattern  recognition  theory,  we  are  seeking  test 
features  which  readily  discriminate  between  'sick*  and  'healthy*  systems,  as  shown  in  the  two  dimensional 
case  of  Fig. 2(a),  and  wish  to  discard  test  features  such  as  those  shown  in  Fig. 2(b)  which  confuse  the 
status  of  the  SUT.  Fortunately,  in  avionics,  unlike  medicine,  the  system  is  created  by  a known  designer, 
and  mathematical  and  functional  models  exist.  Providing  the  analysis  is  done  at  the  system  design  stage, 
it  is  relatively  straightforward  to  select  adequate  test  features  by  simulating  the  response  of  samples  of 
'sick'  and  'healthy*  systems.  Critical  regions  of  the  response  can  then  be  identified,  as  shown  in 
Fig.  3,  and  competitive  feature  sets  compared  on  a statistical  basis.  It  is  also  clear  from  Fig. 3 that 
the  static  test  only  tells  us  that  the  system  is  operating,  not  whether  or  not  the  system  is  operat ional , 
i«e,  will  function  as  intended  in  the  real  life  operational  role. 
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3.  Why  Use  Dynamic  Testing? 

The  main  reasons  for  using  dynamic  testing  are  as  follows: 

(a)  User  Confidence 

Avionic  systems  are  designed  to  respond  to  real  time  varying  input  signals,  so  that  a high  degree  of 
correlation  exists  between  mission  success  and  the  setting  of  suitable  dynamic  performance  specifications. 
For  example,  in  analysing  the  integrity  of  aircraft  all-weather  landing  systems  required  to  achieve  better 
safety  standards  than  one  accident  in  10^  landings,  it  is  possible  to  allocate  a fraction  of  the  permissible 
touchdown  error  to  the  dynamic  response  of  the  system. (3)  After  extensive  simulator  studies,  it  is  then 
possible  to  determine  a frequency  response  for  the  system  which  satisfies  the  'false  abort'  case  due  to 
guidance  signal  noise,  and  to  check  that  this  design  produces  suitable  dynamic  recovery  of  the  aircraft 
when  disturbed  by  a wind  gust.  A dynamic  performance  specification  for  the  all-weather  landing  system 

implies  in  turn  the  existence  of  a dynamic  performance  specification  for  all  the  major  sub-systems 

written  to  ensure  that  the  overall  requirement  is  met. 

(b)  Spares  Inventory  Reduction 

If  only  static  performance  tests  are  used  to  establish  the  status  of  a system  which  has  to  meet  a 

dynamic  operational  capability,  in  order  to  confirm  the  integrity  of  the  system  it  is  found  that  static 

tolerances  have  to  be  made  excessively  tight.  Consequently  many  'healthy'  systems  are  wrongly  categorised 
as  'sick',  resulting  in  setting  up  an  excessively  large  spares  inventory  in  order  to  provide  a reasonable 
level  of  system  availabi lity. (4) 
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(c)  Repair  Costs  Reduction 

Xt  follows  from  the  previous  paragraph  that  the  repair  load  on  the  maintenance  depot  will  be  reduced 
because  a smaller  proportion  of  'healthy'  SlTT's  will  be  wrongly  sent  back  for  stripdown  and  repair.  In 
addition,  for  SUT's  correctly  classified  as  'sick',  it  is  possible  to  infer  from  dynamic  test  results  the 
possible  causes  of  failure  (5),  (6),  thus  reducing  fault  locate  time  and  hence  repair  costs. 

(d)  Increased  System  Reliability  via  Component  Reduction 


Dynamic  tests  can  be  designed  to  reduce  the  need  for  intermediate  access  points  used  to  project  or 
monitor  signals  needed  in  static  test  schemes.  Since  provision  of  access  points  degrades  system 
reliability,  their  omission  will  be  beneficial  in  this  respect. (7) 

(e)  Increased  System  Reliability  via  Failure  Prediction 

Xt  has  been  stated  that  for  integrity  analysis  of  the  Concorde#  autopilot  failures  may  reasonably  be 
categorised  as  having  equal  probability  for  'catastrophic'  changes  and  'drift'  changes  in  performance  (8). 
Although  as  yet  there  is  no  direct  evidence  that  dynamic  testing  may  help  in  predicting  impending 
'catastrophic'  changes  in  performance,  it  has  been  suggested  that  prediction  of  gradual  degredation  via 
regular  testing  and  time  series  analysis  is  feasible,  (9)  so  that  potentially  'sick'  SUT's  may  be  removed 
prior  to  failure  and  restored  to  a satisfactory  condition, 

4.  Transfer  Function  Models 

For  linear  systems,  a transfer  function  model  of  the  form, 

H(s)  = Q] 

i=n  • ^ J 

y ■ n ® 

^1=0  1 

where  s is  the  Laplace  operator,  defines  the  response  of  the  SUT  for  all  possible  inputs.  For  a test 
stimulus  X(t)  with  Laplace  transform  X(s) , the  response  of  the  SUT  may  be  written 


Y(t)  = ^ X(s)  H(s) 

Of  particular  interest  in  pre-flight  dynamic  testing  are  the  following  standard  responses:- 

(a)  the  impulse  response  (or  weighting  function)  defined  by 

h(t)  = ^‘^H(s) 

(b)  the  previously  met  step  response,  defined  by, 

u(t)  = HM 

and  (c)  the  ramp  response,  defined  by. 


Of  these  responses,  the  step  response  is  the  one  most  commonly  used  in  avionic  systems  and  is  readily 
available  in  standard  form  for  a wide  variety  of  transfer  functions. (10)  It  should  be  noted  that  if 
the  system  is  mildly  non-linear,  which  is  usually  the  case  with  present  day  design  and  manufacturing  skill 
levels,  the  transfer  function  model  may  still  be  a satisfactory  representation  of  the  SUT  under  the  stipulated 
test  conditions,  but  care  is  then  needed  in  interpretation  of  the  results  to  other  test  situations. (11) 

The  popularity  of  the  step  input  is  partly  due  to  the  simplicity  of  signal  generation  and  partly  to 
the  intuitive  understanding  of  this  response  by  system  designers. (12)  Direct  impulse  testing  is  not  fav- 
oured due  to  the  problems  of  signal  generation  and  excessive  disturbance  of  the  SUT,  but  as  we  shall  see 
later,  the  impulse  response  may  be  estimated  indirectly  via  pseudo-noise  sequence  injection  and  output- 
input  cross-correlation  which  overcomes  these  difficulties.  Ramp  testing  is  frequently  used  in  tracking 
systems  since  the  ramp  function  may  be  regarded  as  similar  to  operational  inputs  often  experienced  during 
at  least  part  of  the  mission.  The  steady  state  ramp  error  occurring  after  'lock-on'  is  then  of  particular 
interest. 


Although  in  theory  the  impulse,  step,  and  ramp  responses  contain  the  same  information  on  system 
performance,  in  practice  the  extraction  of  the  information  can  be  made  difficult  by  an  unsatisfactory  choice 
of  test  stimulus.  This  is  particularly  true  if  the  steady  state  ramp  error  is  inferred  from  the  SUT  step 
response,  since  any  integration  inaccuracies  will  affect  the  final  estimate  of  a relatively  small  quantity 
which  is  often  regarded  as  of  fundamental  importance.  l^en  estimated  from  the  ramp  response,  the  steady 
state  error  is,  of  course,  obtained  from  a single  measurement.  A less  obvious,  but  equally  useful 
observation  on  test  stimulus  selection  concerns  the  amplification  of  SUT  secondary  resonances  via  impulse 
testing  (there  will  be  applications  where  secondary  resonances  due  to  drive  mechanisms,  structural 
deflections  etc.  will  need  to  be  assessed  as  part  of  the  checl^ut  test,  and  other  applications  where  this 
need  not  be  done).  To  see  why  this  is  so,  we  write  equation QJ  into  SUT  pole-zero  form. 
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where  "O' is  the  product  sign.  We  do  not  commit  ourselves  at  this  stage  on  the  specific  breakdown  between 
real  and  complex  factors  in  equation  ^3^,  so  some  of  the  limits  are  left  open.  The  SUT  zeros  are  given 
by 
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and  the  SUT  poles  (or  * modes*)  are  given  by 
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(for  complex  z) 

(for  real  p) 
(for  complex  p) 


The  solution  for  Y(t)  may  now  be  written  in  terms  of  the  residues  A.  (of  which  (p.  is  a constituent 
for  complex  modes),  as  would  be  obtained  on  solution  of  equation  2 as  fallows,  ^ 


Y(t)  * 5^.  A.e  ^ I **  A.e  j nj  sin  (oj  -^1  ~ 4>-) 

^j*l  J L J J J t 

+ steady  state  terms  [&3 

The  residues  depend  on  the  SUT  poles  and  zeros,  but  also  on  X(t),  and  it  is  their  dependence  on  X(t)  which 
can  be  turned  to  advantage  in  system  testing. 

As  an  example,  consider  the  case  of  a fourth  order  SUT,  in  which  there  are  two  complex  modes  of 
damping  ratio  and^2  respectively,  typically  being  lightly  damped.  In  the  s plane,  the  frequency 
separation  is  where  X >>  1,  by  definition  of  a secondary  resonance.  If  Aj^  and  A2  are  the 

residues  at  the  systStn  poles,  it  is  readily  shown  that  (13) 


*2 

h(t) 

^^2 

u(t) 

The  ramp  residues  are  similarly  attenuated  with  respect  to  the  step  response  by  a factor  X.  Consequently, 
the  secondary  mode,  even  when  lightly  damped,  is  heavily  attenuated  each  time  the  test  stimulus  is 
integrated,  as  shown  in  Fig,^.  Therefore  one  advantage  of  the  impulse-like  test  is  the  exposure  to  view 
of  the  higher  resonances  so  that  the  system  designer  is  forced  to  consider  their  implication  (if  any)  on 

operational  performance.  Although  the  theory  behind  equation  7 is  based  on  a fourth  order  transfer 

function,  these  attenuation  effects  are  present  irrespective  of  the  order  of  the  system,  as  confirmed  in 
the  testing  of  a high  order  SUT.  (14) 

5.  Selection  of  Checkout  Features  for  Step  Testing 

It  is  clear  from  section  4 that  high  frequency  modes  are  generally  severely  attenuated  in  the  system 
step  response,  so  that  the  decision  to  implement  such  a checkout  test  implies  that  such  secondary  modes 
are  (a)  not  present,  (b)  are  unimportant,  or  (c)  are  checked  via  a separate  test.  As  a consequence,  the 
selection  of  suitable  test  features  based  on  sampled  values  of  the  step  response  is  simplified,  so  that 

even  if  a simulation  study  is  implemented  to  find  a 'best*  set  of  features  as  suggested  in  section  2,  it 

should  be  possible  to  start  with  a near  optimum  solution. 

Based  on  a selection  of  studies  on  various  SUT's  the  present  author  suggests  that  four  sample  times 
should  prove  satisfactory  for  pre-flight  checkout,  since  these  time  delays  are  generally  sensitive  (as  a 
set)  to  parameter  changes).  These  are  shown  in  Fig. 3. 


ru(0.5t  );  u(t  );  uO.St  );  u(t  )]^ 

sp  sp  sp  D 


where  t is  the  time  to  first  peak  overshoot  of  the  nominal  SUT,  and  t_  is  the  practical  SUT  decay  time, 

fF.J^is^Pshorthand  notation  for  the  feature  vector  (set  of  measurementsT  used  in  checkout  decision  making. 

vin  if  the  test  designer  is  not  operating  under  constraints  imposed  by  system  test  time  or  computer 
capacity,  the  discrete  feature  vector  should  not  be  made  needlessly  long,  since  there  is  a danger  of 
factors  important  to  the  operational  efficiency  of  the  SUT  are  not  submerged  in  a wealth  of  unimportant 
detai 1 . 

Ideally,  the  gate  widths  and  test  feature  selection  need  confirmation  from  field  trials  or 
simulation  studies  before  final  cotnnital  within  the  test  schedule.  One  technique  for  initial  gate  width 
selection  is  to  assign  realistic  tolerances  to  all  parameters,  and  then  compute  the  expected  boundaries  of 
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performance  variation  of  'healthy'  systems  using  sensitivity  functions  and  assuming  the  parameter 
independence  rule  to  hold.  (15)  Hence  if  is  the  jth  parameter,  then  the  gate  width  ±g^  is  given  by, 


where  ^ are  sensitivity  functions,  and  Aa.,  is  the  % tolerance  set  on  the  j parameter,  there 

"“i 

being  J parameters  in  all, 

6.  Selection  of  Checkout  Features  for  Impulse  Testing 

If  high  frequency  secondary  oscillatory  modes  are  not  observable  on  the  impulse  lesponse,  feature 
selection  for  checkout  is  straightforward,  and  in  like  manner  to  section  5,  a feature  vector  with  four 
elements  will  often  prove  adequate.  A reasonable  set  for  initial  investigation  is. 


T 

Qil  - 


where  t „ is  the  time  to  impulse  response  peak  of  the  nominal  SUT.  When  high  frequency  lightly  damped 
secondary  modes  are  observable  such  as  in  Fig. 4(a)  the  problem  becomes  much  more  complicated  because 
acceptable  changes  in  the  dominant  mode  can  render  gates  set  to  constrain  the  secondary  mode  ineffective. 

A possible  solution  is  to  supplement  the  feature  vector  of  equation  10  with  a further  three  features 
chosen  close  together  and  within  the  first  observable  period  of  secondary  oscillation. 

It  should  be  noted  that  unless  there  are  many  dynamic  performance  requirements  written  into  the  SUT 
operational  specification,  it  is  extremely  unlikely  that  the  number  of  test  features  needed  for  pre-flight 
checkout  will  approach  the  (n  + q + 1)  miminum  needed  for  transfer  function  identification  because  the 
information  on  whether  or  not  the  system  is  operational  is  contained  in  just  a few  measurements.  The 
needs  of  testing  for  pre-flight  checkout  and  design  proving  are  therefore  significantly  different  and 
this  fact  is  already  exploited  by  test  designers  since  it  is  known  that  many  existing  test  schedules  call 
for  as  few  as  six  data  points. 

7.  System  Frequency  Response 

If  a stable  linear  system  with  transfer  function  H(s)  is  excited  by  a sinusoidal  signal  a sinut,  it  is 
well  known  Chat  after  an  initial  transient  phase,  the  system  will  settle  down  to  a steady  sinusoidal 
response  of  the  same  frequency  as  the  input.  In  general  there  will  be  a phase  shift  ♦ and  the  output 
waveform  will  be  of  a different  amplitude  to  the  forcing  function,  so  that  in  the  absence  of  measurement 
noise  the  output  signal  may  be  written  as  ka  sin(u)t  + if) . It  is  also  readily  shown  that  the  solution  of 
the  differential  equation  describing  the  steady  state  behaviour  is  obtained  by  substituting  ju  for  s in 
equationQJ  This  gives  a rotating  vector 


which  after  writing  in  the  form  ^ ^ can  be  put  in  standard  polar  notation  as 


-1-11  — 
f - tan  (B/A)  - tan  (D/C)  j 

Equation  gives  the  information  available  from  the  steady  state  response  to  a single  sinusoidal 

input,  4 being  the  aforementioned  phase  shift  and  k = |H(jui)|  being  the  system  amplitude  ratio.  Because 
both  phase  and  amplitude  ratio  are  available  we  have  two  test  features  per  test  frequency.  It  is 
important  to  make  use  of  both  features  if  test  time  needs  to  be  reduced.  In  calculating  the  theoretical 
system  frequency  response,  discrete  values  of  oo  are  substituted  into  H(jui)  and  the  results  plotted  as  a 
function  of  ui. 

If  an  individual  sine  wave  is  injected  into  the  SUT,  one  amplitude  ratio  and  one  phase  estimate  is 
made  available,  so  that  if  the  test  is  repeated,  a series  of  discrete  points  may  be  plotted  in  exactly 
the  same  manner  as  for  the  calculated  values.  This  is  the  serial  method  of  test,  in  which  we  wait  until 
the  system  has  settled  at  each  test  frequency  before  a measurement  is  made.  Total  test  time  is  therefore 
some  non-linear  function  of  settling  time  multiplied  by  the  number  of  test  frequencies  needed.  At  the 

design  and  development  stage  of  a system,  a wide  ranging  frequency  response  plot  such  as  shown  in  Fig. 5 is 
essential  for  proving  the  design,  and  for  finger-printing  purposes.  Test  time  for  such  wideband 
information  can  be  considerably  reduced  still  using  an  essentially  serial  mode  technique  in  which  the 
sinusoidal  input  has  a slowly  time  varying  frequency  of  excitation,  (16)  and  this  has  proved  an  extremely 
useful  method  in  the  past  even  when  using  such  crude  displays  of  the  return  signal  as  chart  recorders. 

The  slow  sweep  method  is  based  on  the  fact  that  by  careful  choice  of  frequency  sweep  characteristics, 
the  envelope  of  the  system  output,  when  plotted  as  a function  of  time,  approximates  to  the  system  amplitude 
ratio,  and  the  phase  shift  may  be  recovered  as  well.  With  such  a slow  sweep  technique  a satisfactory  test 
time  for  a second  order  system  with  natural  frequency  Unr/s  would  be  of  the  order  of  lOO/u^  seconds. 
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1 

The  slow  sweep  frequency  may  be  obtained  using  special  purpose  instruments,  or  from  a computer  controlled  \ 

Fourier  Response  Analyser  (FRA)  of  the  type  to  be  described  in  the  next  section.  j 
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8.  Correlation  Techniques  in  Frequency  Response  Measurement  'j 

The  serial  mode  frequency  response  method  of  system  testing  is  now  well  established  on  a world-wide  i 

basis,  and  many  pre-flight  checkout  applications  have  been  reported.  As  far  back  as  1961,  the  method 
was  a recommended  test  for  USAF  equipment,  (17)  and  has  mushroomed  ever  since.  Although  the  method  may 
be  implemented  in  a wide  variety  of  ways,  (18)  including  the  use  of  oscillators,  phase  variable  filters 
and  oscilloscope  displays,  in  a manual  test  mode,  and  sampling  plus  counting  techniques  in  digital  computer 
aided  test  stations,  the  most  universally  used  technique  involves  correlation,  which  is  needed  to  reduce 
the  effect  of  measurement  noise  on  gain  and  phase  estimates.  The  frequency  domain  test  instrument 
designed  to  exploit  the  correlation  principle  has  become  known  as  the  Fourier  Response  Anlayser  (FRA) 
because  gain  and  phase  estimators  are  identical  to  the  Fourier  series  coefficients  used  to  describe  a 

repetitive  waveform.  * 


By  definition,  the  cross-correlation  function  4>j^(t)  of  two  signals  X(t)  and  Y(t)  is, 

= Lim 
T-+00 

For  sinusoidal  excitation  we  have, 

X (t)  * a sii 
s 

with  a noisy  return  signal  from  the  SUT  of, 

Y(t)  = k.a.  sin  (u)t  + 0)  + n(t) 

We  also  require  the  cosine  signal, 

X (t)  * a cos  wt 
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to  be  available  as  well,  which  is  readily  achieved  using  a slave  oscillator  or  a time  delay  unit.  It  is 
implicit  that  the  integration  time,  T,  is  chosen  to  be  an  inte^^r  multiple  of  the  input  sine  wave,  so 
that  T ■ 2nN/(u.  We  now  correlate  the  return  signal  (equation  [1^ ) wi th  the  sine  and  cosine  signals  of 
equationQ4Jand  (IbJ  respectively,  and  set  t ■ 0.  (19)  The  result  is 
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(t)  sin  wt  dt 


00 


which  is  the  in-phase  estimate. 


\rg,2  a 

*X  sin  0 +1 
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i(t)  cos  ult  dt 


which  is  the  quadrature  estimate. 


On  the  assumption  that  noise  may  be  neglected,  the  gain  and  phase  estimators  become. 


^ ^(0)  + [♦jj  yCo)]^}^ 


which  is,  of  course,  the  estimate  of  |H(ju))|, 


.1 


A block  diagram  of  the  FRA  principle  is  shown  in  Fig. 6.  Having  been  available  commercially  in  analogue 

form  for  many  years,  it  is  now  available  from  several  manufacturers  in  digital  form  with  computer  control 
capability  for  use  in  an  automatic  test  set. 


The  correlation  process  may  be  regarded  as  a filter  through  which  we  can  observe  a cleaned  up  version 
of  a noisy  return  signal,  thereby  providing  reasonable  estimates  of  k and  An  important  feature  of  the 
FRA  is  the  ability  to  reject  harmonics  present  in  the  return  signal,  since. 


^ I 


sin(no}t  ) sin  wt  dt  ® 0 
n 


sin(ntut  + 0^)  cos  dt  = 0 


(for  n = 2,  3,  4 


At  non-harmonic  noise  frequencies,  the  errors  in  ^(0)  and  y(o)  may  be  calculated  from 

theoretical  considerations  (20)  as  can  the  effect  of  while  noise  on  She  in-phase  and  quadrature 
measurements  (21) . The  effect  of  sinusoidal  and  white  noise  on  gain  and  phase  estimates  k and  ^ is  much 
more  difficult  to  predict.  One  fact  which  does  emerge  strongly,  however,  is  that  measurement  variance 
is  reduced  as  measurement  time  is  increased,  so  that  N is  chosen  to  achieve  adequate  noise  rejection.  In 
field  work  the  present  author  has  found  the  central  limit  theorem  (0  proportional  to  1//N)  to  be  a reasonable 
guide  to  measurement  time  selection,  (22)  as  shown  in  the  example  of  Fig. 7,  which  also  illustrates  the 
point  that  although  correlation  greatly  reduces  the  effect  of  noise,  in  the  practical  situation  perfect 
filtering  is  lonlikely  to  be  achieved. 


Selection  of  Checkout  Features  for  Frequency  Response 


If  the  serial  mode  frequency  domain  technique  is  used  in  situations  of  limited  SUT  test  time,  it 
becomes  extremely  important  to  select  a few  frequencies  which  will  yield  the  necessary  confidence  in  the 
operational  status  of  the  system.  Fortunately,  in  the  frequency  domain,  this  is  not  difficult,  because 
the  operational  function  of  the  SUT  partitions  nicely  into  the  following  three  regions  (22)  as  illustrated 
in  Fig. 5. 


(a)  low-frequency  region  where  the  SUT  generally  is  expected  to  track  the  input  closely.  In  order 
to  obtain  the  necessary  resolution  it  is  useful  in  a feedback  system  to  monitor  the  error  signal  directly 
rather  than  estimate  the  error  from  input  and  output  measurements.  The  information  yielded  from  the  low 
frequency  region  relates  to  time  domain  data  in  the  decay  time  region,  and  can  usually  be  obtained  from 
one  test  frequency  at  about  5Z  bandwidth. 


(b)  mid-frequency  region  encompassing  peak  amplitude  ratio  (>L)  and  bandwidth  (m.  ) . This  region 
primarily  determines  such  important  time  domain  performance  characteristics,  as  Ip,  Sp,  Rp  (ramp  peak). 


and  the  times  at  which  these  peaks  occur,  proving  that  on  a practical  basis,  time  and  frequency  domain 
test  methods  are  interchangeable.  These  time  domain  criteria  can  be  adequately  constrained  by  three 


test  frequencies  ^90°  ^135®  chosen  on  the  basis  of  SUT  nominal  performance  so  that 


- [ |H(j«)L  : ; |H(j(.)|  ; i ! ;(t.f  ]'^  (22] 

45  45  90  90  '135  135 


The  use  of  three  such  test  frequencies  for  pre-flight  SUT  checkout  is  very  popular  since  it  is  an  adequate 
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feature  vector  for  many  SUT's.  In  addition  to  FRA  and  digital  computation  methods,  analogue  based  BITE 
(built-in-test-equipment)  designed  only  to  test  at  such  selected  frequencies  is  also  used. 

(c)  high  frequency  region  well  beyond  bandwidth,  encompassing  noise  rejection  and  secondary 
resonance  requirements . In  the  time  domain,  the  high  frequency  noise  rejection  is  compressed  into  the 
region  around  t • 0,  and  the  secondary  modes  are  superimposed  in  the  manner  already  seen  in  Fig. 4,  but  in 
the  frequency  domain  the  secondary  mode  of  Fig. 5 is  well  separated  from  the  dominant  mode,  and  can  be 
detected  for  checkout  purposes  by  three  further  test  frequencies,  one  at  the  nominal  value  of  modal 
frequency,  plus  one  either  side.  (14)  High  frequency  noise  rejection  beyond  bandwidth  can  be  ascertained 
from  one  or  two  test  frequencies  depending  on  the  rate  of  roll-off  sought,  so  that  an  extensive  test 
schedule  may  appear  as  shown  in  Table  I. 


Table  I 
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Derivation  of  Frequency  Domain  Test  Schedule  for  Complex  SUT 

We  therefore  conclude  that  frequency  domain  feature  selection  relates  naturally  to  the  performance 
specification  for  the  SUT,  and  to  the  design  procedure  as  well,  since  classical  frequency  domain  techniques 
remain  popular  for  this  purpose.  If  dynamic  errors  are  the  only  performance  aspects  needing  checkout,  as 
is  frequently  the  case,  then  three  test  frequencies  are  sufficient.  Noise  rejection  and  tracking 
performance  each  require  at  least  one  more  test  frequency.  A secondary  resonance  requires  three  further 
test  frequencies,  but  unless  the  measurement  noise  is  severe  the  test  time  is  not  excessive.  If  more  than 
one  secondary  resonance  is  important  either  the  swept  frequency  method  or  the  spectral  analysis  method  to 
be  discussed  later  is  suggested. 

10.  Indirect  Impulse  Testing  Via  PNS  Excitation  and  Cross-Correlation 

10. 1 Development  of  the  Convolution  Integral  in  Terms  of  Input  Autocorrelation  Function 

In  recent  years  the  pulse  testing  time  domain  and  serial  mode  frequency  domain  techniques  have  been 
rivalled  by  the  appearance  of  pseudo-noise  test  signals,  which  for  linear  systems,  can, via  the  cross- 
correlation principle,  yield  under  specified  conditions  a realistic  approximation  to  the  system  impulse 
response  without  the  physical  injection  of  an  impulse  stimulus.  (23) (24)  The  theoretical  basis  for  this 
work  is  the  convolution  integral,  so  that  if  X(t)  and  Y(t)  are  the  SUT  input  and  output  signals  at  time  t, 
and  h(t)  is  the  SUT  impulse  response  as  before,  then 

OQ 

Y(t)  = J"  h(t)  X(t  - t)  di 


If  we  now  combine  equation  23.  (convolution)  with  equation  13.  (cross-correlation),  we  may  write. 


X(t  - T^)dt 


T 

J h(tj)  X(t  - Tj)  dx^ 
o 


interchanging  the  order  of  integration  we  have, 
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T T 

h(Xj)  ^ X(t  - Tj)X(t  - T^)  dt  dt^ 

o o 


[25] 


where  **’yy(T)  is  the  autocorrelation  function  of  the  input  signal  averaged  over  measurement  time  T. 
Equation's  is  the  Wiener-Hopf  equation,  and  in  particular,  if  ^’yy('r)  is  the  unit  impulse,  then 
$ (t)  = h(t)  and  equation  25  is  an  impulse  response  estimator  for  the  SUT. 


which  can  be  written  in  final  form  as, 

T 


iiue  I'Wv.  "C 

Fl(j.  3.  REMOVING  UMCERTAiNTY 
IN  6Y  USIKg  PM  5 stimulus, 


IfJfl 


/>  j 


TIME 

iJtCctocx  reiiioo) 
(O^)  RN.5.  SAMPLE 


-N4t 


<b)  PNS 

FIG  9.  RN.S  CHARACTERISTICS. 


S(k) 


P N 5. 


5.U.T 


V^t) 


66  LAV  T, 


I HgglAL  IM^UMEMTATlpM 

l-HWxTr'' — ^xf 

' ^ — _____  _ J I L-  _ J 

: T 4 


[xH5>0 


TN  ft 
u'  \ J 

i LhW.tVL — 

: u I L. I L-J  I 

I TAgAtLet  Mtf&C  IMPLCmenTATipm | 


FIG.  10.  IMPLEMENTATION  OF  IMPULSE 
TESTING  VIA  P.N.S  INTECTtON  AND 
CROSS  CORRELATION. 


10.2  PNS  Characteristics 

White  noise  has  the  unit  impulse  autocorrelation  function  function  required  by  equation  25  but 
unfortunately  infinite  measurement  time  is  implied  for  satisfactory  estuuotes.  As  can  be  seen  from 
Fig. 8,  the  uncertainty  due  to  the  test  signal  can  be  removed  bu  using  a pseudo-noise  sequence  (PNS)  with 
precisely  defined  statistical  properties  which  sufficiently  approximate  to  white  noise  for  the  purpose 
of  dynamic  testing.  (24)  Two-level  sequences  (PRBS) , are  particularly  attractive  for  this  purpose  since 
they  are  easily  generated  by  shift  registers  incorporating  the  necessary  feedback  and  operating  on 
modulo  2 arithmetic.  The  resultant  test  signal  and  autocorrelation  function  are  shown  in  Fig. 9,  and  the 
schematic  mechanisation  is  shown  in  Fig. 10,  the  method  of  test  being  either  serial  or  parallel  mode 
depending  either  on  the  number  of  delay  lines  provided,  or  on  the  provision  of  intermediate  storage  prior 
to  computation.  ^ function  of  the  clock  period  AT,  and  the  amplitude  of  the  pulse  ±a,  which 

may  be  conveniently^xpressed  as  follows:  (25) 


(N  + 1)  a2 


where  6(t)i8  the  Dirac  delta  function.  The  'spike*  in  equation  26  repeats  with  a periodicity  N At, 
where  N • (2^  - 1) , R being  the  length  of  the  generator  shift  register.  R * 10  is  a consnon  length,  giving 
a test  sequence  periodicity  of  1023  At  but  success  with  a specific  range  of  SUT's  has  been  achieved  with 
R as  low  as  6,  giving  N > 63  bits.  As  R increases,  so  is  the  error  in  estimation  of  h(t)  due  to  d.c. 
offset  reduced,  and  the  need  for  post-test  correction  avoided.  However,  this  is  a minor  reason  in  the 
choice  of  N,  since  post-test  correction  is  not  difficult,  and  the  problem  may  be  avoided  altogether  by  the 
use  of  inverse  repeat  sequences. 


10. 3 Matching  the  PNS  to  the  SUT 


If  the  system  delay  time  is  less  than  N At,  the  cross-correlation  function  for  the  SUT  excited  by 
two-level  PNS  is,  (25) 


a^(N  l)At  h( 


NAt 

-4  J 


h(t)  dt 


Q7] 


.J 
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and  it  is  clearly  useful  to  reduce  or  eliminate  the  second  term  on  the  right  hand  side  of  equation  by 

making  N large,  or  by  using  inverse  repeat  sequences.  However,  in  selecting  PNS  characteristics  for 
system  testing,  it  is  useful  to  place  the  graphical  interpretation  of  Fig.  11  on  equation  [25]]  so  that  the 
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various  sources  of  error  other  than  d.c.  offset  may  be.  understood.  These  errors  can  be  dealt  with  in 
three  parts. 

, (a)  equation^2S]}  implies  integration  over  all  time:  in  the  practical  situation  this  means  that  the 

I product  of  the  two  functions  must  be  zero  outside  the  time  span  of  integration,  which  in  turn  requires  the 

I second  spike  to  occur  after  the  response  is  over.  The  PNS  sequence  length  must  therefore  la  somewhat 

longer  than  the  decay  time  of  the  system 

(b)  the  initial  value  will,  in  general,  be  in  error,  because  the  triangular  autocorrelation  function 
is  centred  at  the  origin,  so  that  the  product  will  differ  from  the  true  impulse  response.  Even  for  a 
narrow  autocorrelation  function,  there  will  be  an  error  between  the  actual  and  estimated  impulse  response 
as  can  be  seen  in  Fig. 8.  This  particular  source  of  error  disappears  for  t > At. 

(c)  errors  due  to  the  finite  width  of  the  autocorrelation  function  clearly  depend  on  the  behaviour 

. of  h(t)  in  any  time  interval  2At.  From  Fig. 11  it  is  clear  that  distortions  can  take  place  in  regions  of 

I high  rates  of  change  of  h(t).  In  particular,  oscillations  present  on  the  impulse  response  which  have  a 

I period  comparable  with  the  pulse  width  are  removed  by  the  multiplying  and  integrating  action  of  convolution 

I as  will  be  seen  in  section  10.5.  As  would  also  be  expected  from  an  information  theory  approach,  At  must 

t be  chosen  by  considering  the  highest  frequency  likely  to  be  present  in  the  SUT  impulse  response,  and  the 

following  analysis  has  proved  helpful  in  making  the  choice. 

j In  a theoretical  study  undertaken  to  detect  oscillatory  modes,  it  has  been  shovm  that  in  order  to 

I detect  the  peak  to  within  IZ,  the  ratio  of  (modal  period/clock  period)  must  be  about  20:1.(26)  As  this 

, ratio  decreases,  the  accuracy  oi^  estimation  falls  off  rapidly,  as  shown  in  Table  II,  a ratio  of  10:1 

I appearing  to  be  a reasonable  compromise  choice  for  good  resolution.  The  existence  of  secondary  modes  can, 

of  course,  be  detected  with  a much  lower  ratio  of  (modal  period/clock  period)  than  10:1,  as  a number  of 
case  studies  have  shown,  but  the  secondary  mode  is  then  greatly  attenuated  compared  to  the  true  impulse 
: behaviour. 


TABLE  II  Accuracy  of  Detection  of  Sine  Wave 

Amplitude  Using  PNS  and  Cross  Correlation  (26) 


/ Resonance  Period  \ 
( PNS  Clock  Period  / 

1 Estimated  Peak  Amplitude'N 
V,  True  Peak  Amplitude  / 

10.00 

97.52 

3.33 

742 

2.00 

412 

1.43 

142 

1.10 
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10.4  Noise  Rejection  Characteristics 

The  error  variance  in  estimating  SUT  impulse  response  via  PNS  and  cross-correlation  may  be  expressed 
as,  (21) 


. (t^  - t)<t  (t^  - t) 

nn  XX 


provided  the  noise  and  test  signals  are  uncorrelated.  If  the  corrupting  noise  is  white,  then  ^ (t)  <=  o6(t), 

where  p is  the  spectral  density,  and  6(t)  is  the  Dirac  delta  function.  Substituting  in  equatioR^^  and 
integrating  results  in  the  simple  expression 


[29] 


so  that  the  standard  deviation  is  proportional  to  the  signal  level  la,  and  is  inversely  proportional  to  the 
square  root  of  measurement  time  divided  by  spectral  density,  i.e.  the  central  limit  theorem  applies.  A 
further  interesting  conclusion  from  equation  (29^  is  that  the  variance  error  is  independent  of  delay  time. 
Fig. 12  shows  the  cross-correlation  function  for  the  SUT  previously  studied  in  Fig. 5 and  measured  in  a 
typical  maintenance  environment  of'a  priori*  indeterminate  noise  level  and  it  can  be  seen  that  the  variance 
error  is  reasonably  constant  and  is  reduced  by  approximately  (l//r)  if  correlation  takes  place  over  r 
sequences  of  PNS.  If  required  equation  29  may  also  be  expressed  as  a signal-to-noise  ratio.  Since  the 
equivalent  impulse  generated  by  the  PNS  test  signal  is  at  a^At,  the  signal-to-noise  ratio  will  be;- 


I - h(t)aat  / - 
N p 

10.5  Example  on  PNS  Selection 


Do-] 


Suppose  we  wish  to  identify  the  SUT  of  Fig. 4(a)  using  PNS  and  cross-correlation  to  estimate  the 
impulse  response.  The  procedure  is  as  follows, 

(a)  From  the  observed  decay  time  of  the  complete  impulse  response  of  12  seconds,  NAt  > 12. 


(b)  From  accuracy  considerations  in  identifying  the  secondary  mode.  At  < 
intervals  per  period  of  the  secondary  mode. 


2ir 


5 X 10 


, if  we  choose  10  clock 


If  At  is  made  0.05Hz  then  N > 240,  so  that  R * 8 giving  N * 253  would  be  satisfactory  for  accurate 
identification.  The  practical  effect  of  varying  At  and  N is  shown  in  Fig. 13  and  confirms  the  chosen  PNS 
parameters.  The  remaining  variables  in  the  test  schedule,  a and  r,  determine  the  accuracy  of  the 
estimates  given  a satisfactory  choice  of  At  and  N.  This  can  be  done  by  assuming  a value  of  p and  choosing 
a desired  signal-to-noise  ratio,  but  is  better  done  when  typical  measurement  noise  characteristics  for  a 
family  of  SUT*s  becomes  available,  since  in  the  present  author's  experience  the  noise  level  can  vary 
conaiderably  within  a batch  of  apparently  similar  SUT's.  For  the  exan^le  shown  in  Fig. 12(a),  r * 1 is 
considered  too  short  a measurement  time  since  the  uncertainty  band  is  comparable  to  the  permissible  variation 
in  performance  within  the  'healthy*  family.  If  the  ±3o  bounds  due  to  noise  are  to  be  kept  within  ±2  units 

(i.e.  ±4%  of  Ip  which  is  = 50'units)  which  is  more  reasonable,  then  r = 4 would  appear  to  be  a suitable 
choice,  since  -3a^  = 4 units, 

10.6  Test  Time  Using  PNS 

For  correlation  over  r sequence  lengths,  total  correlation  time  T in  equation  25.  = r N At.  It  is 
customary  to  allow  one  complete  sequence  of  PNS  to  'initialise'  the  SUT  prior  to  correlation  commencing, 
so  the  total  test  time  is  (r  + 1)  N At.  In  the  early  days  of  PNS  testing  via  special-to-type  instruments, 
only  one  delay  line  was  available,  so  that  a test  time  of  (r  ♦ 1)  N At  was  required  per  each  point  on 
“^YvC^).  More  recent  instruments  have  typically  provided  100  delay  lines,  so  that  100  points  on  ♦ .(t) 

XY 

can  be  estimated  from  a test  time  (r  1)  N At  in  the  so-called  'parallel*  mode  of  Fig. 10.  The  PNS 
technique  can  also  be  implemented  directly  by  digital  computer,  but  care  must  be  taken  to  design  the 
test  schedule  so  as  not  to  exceed  the  computer  capacity,  (1)  although  as  seen  in  section  6,  this  need 
not  be  a handicap  in  pre-flight  testing  since  only  a few  test  features  are  needed. 

10.7  Frequency  Response  Directly  from  PNS  Injection 

PNS  stimuli  have  a precisely  defined  frequency  domain  (sin  x/x)^  line  spectrum  with  spectral  lines 
occurring  at  (2Tt/NAt);  (4Tr/NAt);  etc.  with  nodes  occurring  at  integer  multiples  of  the  clock  frequency 
(2Ti/At)/sec,  Because  the  input  spectrum  is  so  precisely  defined,  PNS  may  be  regarded  as  a parallel-mode 
frequency  stimulus.  Although  the  frequency  response  could  be  obtained  from  Fourier  transforming 
in  the  normal  way,  if  only  a few  frequency  data  points  are  required  for  checkout  as  suggested  in  ^ 
section  9,  significant  reductions  in  computing  time  result  from  taking  advantage  of  the  known  spectral 
characteristics,  which  have  been  tabulated  over  the  frequency  range  of  interest.  (26)  The  approach  is 
as  follows; 

th 


At  the  r spectral  line  (at  frequency  w^) * the  in-phase  and  quadrature  components  of  the  PNS  input  are. 


R.  (oj  ) • A . cos0  . 
1 r'  ri  ri 

0.  (m  ) - A . sin-^  . 
r ri  ri 


where  A . and  . are  known  'a  priori*.  If  the  return  signal  is  correlated  with  sin(o)  .)  and  cos(u  .) 
in  turn^^as  outlined  in  section  (8),  then  the  in-phase  and  quadrature  coiiq>onents  of  the^^UT  output  aFI 
estimated  as» 

R (u)  ) « A cos* 
or  ro  ro  ) 

\ fai] 

Q (u)  ) “A  sint^  \ — 

o r to  ) 

the  required  checkout  data  | and  ^ can  now  be  evaluated  from. 


ri  V R.'^  + Q. 

1 1 


R 2 + Q 2 

o o 

2 + q.2 
-1,^  , 


* * & -'>.]=  tan'^Q  /R  ) - tan’^(Q,/R.)  hAj 

ti)  ro  ri  o o 1 4-  -j 

As  an  example  of  the  benefit  obtained  using  this  approach,  if  the  FFT  algorithm  is  incorporated,  a 
reduction  in  computation  time  of  30:1  is  estimated  if  only  three  frequencies  are  required  compared  to  the 
method  of  obtaining  the  complete  cross-correlation  function  first. 
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11.  ^ectral  Analysis  Methods 


The  auto  spectral  density  S^(oj)  is  defined  by 


^ C ♦ (T)e-j“^dT 

2it  J XX 


and  the  cross  spectral  density  similarly  defined  by 


OD 

J 


♦ (r)e'^"''dT 

*7 


and  until  recently  system  frequency  response  has  been  determined  via  these  relationships,  (28)  since 

S (^.) 

«(j“)  ■ uO 

XX 

is  a frequency  response  estimator.  However,  it  is  generally  preferable  to  compute  Hfju)  directly  from  the 
Fourier  transform  of  the  input  and  output  signals  using  the  re lationships , 


X(u)  - \ X(t)e'-’“‘dt 


Y(u)  - \ Y(t)e"-’"‘dt 


14-13 


The  SUT  transfer  function  is  then  given  by 

V/.,^  V* 


H(ju)  = ^ 


Y((i)  X*(ui) 
X(u)  X*(u)) 


where  * means  complex  conjugate.  In  practice,  the  calculations  will  be  performed  on  discrete  data,  and 
the  resultant  implications,  and  a detailed  proof  of  the  equivalence  of  the  direct  method  with  the  correlation 
method  are  to  be  found  in  reference  (29).  Computer  implementation  as  suggested  for  testing  electronic  SUT's 
is  shown  in  Fig. 14,  (30),  the  Fast  Fourier  Transform  (FFT)  being  used  in  view  of  the  enormous  reduction  in 
computing  effort  thereby  achieved.  Reference  (30)  presents  a number  of  broad  spectrum  frequency  response 
results  similar  to  Fig. 5 but  obtained  using  spectral  analysis  methods  although  no  recommended  input  stimulus 
is  given  in  that  paper.  In  addition  to  PNS  already  suggested  herein  as  a test  stimulus , white  noise  (31) 
and  a fast  frequency  sweep  (32)  have  been  used  as  test  signals  in  spectral  analysis.  The  method  has  been 
also  successfully  used  to  identify  the  in-flight  aircraft  transfer  function  relating  aircraft  motion  to  pilot 
stick  movement,  in  which  the  PNS  stimulus  was  generated  by  the  pilot  responding  to  a flashing  light  display  (33). 

In  order  to  check  the  influence  of  measurement  and  extraneous  noise  on  spectral  analysis  estimates,  the 
coherency  function, 

|g  p G 

G G G + G 

XX  zz  yy  nn 

is  used,  values  of  * 1 resulting  from  tests  on  a completely  noise  free  linear  system  in  which  case 
G (u))  = G (u)) , which  is  the  quantity  actually  observed.  The  signal-to-noise  ratio  for  the  SUT  is  then 
rXfated  to^^he  coherency  function  by  the  expression 


Hl=r 


which  may  be  used  at  the  SUT  development  stage  as  a guide  to  choosing  a suitable  test  signal  spectrum. 

12.  ^Closed  Loop*  Testing 

All  results  so  far  discussed  have  been  obtained  under  open-loop  conditions,  in  which  the  aircraft  loop 
is  not  closed.  The  present  author  has  participated  in  ground  based  ‘closed  loop'  tests  in  which  the 
aircraft  aerodynamics  and  kinematics  were  analogue  simulated,  the  actual  aircraft  autopilot  and  actuators 
being  stimulated  during  the  test,  as  shown  in  Fig. 15.  The  aircraft  was  situated  adjacent  to  the  simulator 
which  must  be  calibrated  against  flight  trials  -f  a high  level  of  confidence  is  to  be  achieved  using  this 
method.  Fig. 15  also  shows  the  autoland  phase  simulated  during  the  experimental  work,  and  indicates  the 
three  heights  chosen  for  detailed  small  perturbation  tests  using  PNS  and  cross-correlation.  The  aircraft 
response  varies  considerably  during  the  autoland  phase  thus  changing  <ti^(T)  as  a function  of  height,  and  it 
was  confirmed  that  the  effect  of  changes  in  autopilot  parameters  is  also  observable  in  4>xy('f)’  (34) 
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Two  sample  results  .ire  shown  in  Fig. 16,  the  initial  negative  excursion  being  a well  known  result  for  an 
SUT  with  a system  zero  in  the  right  hand  plane.  At  the  time  the  experiments  were  conducted,  the  simulator 
exhibited  reliability  problems  which  would  no  doubt  be  overcome  by  present  day  technology.  It  was  not 
reasonably  proven  for  this  particular  aircraft  that  the  addition  of  the  simulator  increases  confidence  in 
the  operational  status  of  the  aircraft,  mainly  because  the  sensors  are  not  excited  during  the  test  sc  that 
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sub'system  testing  was  implemented  during  maintenance.  However  in  other  areas,  ' closed'^loop*  testing  via 
simulation  is  being  increasingly  used,  particularly  in  the  automatic  testing  of  weapon  systems,  so  that  the 
method  still  warrants  consideration  for  specific  applications. 


13.  Conclusions 


Dynamic  testing  is  now  a universal  method  of  assessing  the  operational  status  of  a vide  variety  of 
systems  ranging  from  amplifiers  at  one  end  of  the  spectrum  to  complete  aircraft  autoland  systems  at  the 
other.  The  advent  of  the  FFT  algorithm,  coupled  with  the  ready  availability  of  digital  computers  has  had 
a considerable  effect  on  the  implementation  of  dynamic  test  techniques.  It  cannot  be  emphasised  too  strongly 
that  it  is  the  dynamic  test  data  itself  which  is  fundamental  and  the  method  of  obtaining  it  is  secondary  to 
the  objective  of  selecting  those  test  features  which  adf^quataly  discriminate  between  'sick'  and  'healthy' 
systems.  It  is  far  better  to  undertake  a manually  controlled  dynamic  test  of  very  simple  form  rather  than 
to  have  no  dynamic  test  al  all.  It  is  hoped  chat  this  paper  has  adequately  reviewed  Che  basic  guide  lines 
to  be  adopted  in  dynamic  test  stimulus  and  measurement  feature  selection. 
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Automatic  terrain  following  system 

System  under  test 

Automatic  test  equipment 

Fast  Fourier  Transform 

Pseudo  noise  sequence 

Signal  to  noise  ratio 


Laplace  operator 
transfer  function  of  SUT 

coefficients  of  s^  in  system  transfer  function  numerator  and 

denominator  respectively 

order  of  polynomials  in  s 

SUT  input  signal 

SUT  output  signal 

SUT  impulse  response 

SUT  step  response 

SUT  ramp  response 

SUT  d.c.  gain 

SUT  time  constants 

SUT  damping  rati  s 

SUT  undamped  natural  frequencies 

product  sign 

SUT  zeros 

SUT  poles 

residue  terms  in  SUT  transient  response 
ratio  of  undamped  natural  frequencies 
time 

transpose  (in  test  feature  analysis) 
correlation  time 

ith  feature  used  to  checkout  SUT 
checkout  gate  width  set  on  i^^  test  feature 
parameter  affecting  performance  of  SUT 
excitation  frequency 
SUT  phase  lag 
SUT  amplitude  ratio 
correlation  function 
measurement  noise 

time  delay  used  in  correlation  function 

peak  amplitude  of  PNS  pulse  and  of  sinusoidal  stimulus 

harmonic  number 

u)T/2it  (in  sinusoidal  testing) 

(2*^  “ 1)  (in  PNS  testing) 
peak  step  response  of  SUT 
peak  impulse  response  of  SUT 

number  of  stages  in  PNS  shift  register 

PNS  clock  period 

standard  deviation 

spectral  density  of  white  noise 

number  of  sequences  of  PNS  over  which  correlation  takes  place 
spectral  density  of  signal 
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INTRODUCTION 

When  redundant  arrays  of  skewed  instruments  are  considered  as  a means  of  achieving  higher  reliability,  it  is  assumed  that  the  system 
is  able  to  detect  and  isolate  the  presence  of  a "failed"  instrument.  However,  it  is  difficult  to  establish  the  resulting  system  performance 
if  the  failure  detection  algorithm  uses  explicit  thresholds  to  reject  failed  instruments.  Details  about  the  distribution  of  errors  in  real 
instruments  are  seldom  known  with  sufficient  accuracy  to  justify  the  underlying  statistical  assumptions  used  in  setting  such  thresholds. 
Thresholdless  detection  algorithms,  which  do  not  depend  on  detailed  knowledge  of  instruments  error  statistics,  are  therefore  of  special 
practical  significance. 

As  an  example  of  a thresholdless  detection  algorithm,  consider  the  so-called  mid-value  selection  technique.  When  three  instruments 
are  used  to  measure  the  same  scalar  quantity,  such  as  one  component  of  vehicle  angular  rate,  a common  technique  is  to  take  the  middle 
value,  or  median,  as  the  estimate  of  the  measured  quantity.  Advantages  of  this  technique  are  that  it  employs  simple  logic;  requires  no 
statistical  information  about  the  instruments  for  its  implementation;  tolerates  one  instrument  failure  with  relatively  little  degradation 
in  system  performance;  and  operating  with  three  unfailed  instruments,  gives  errors  smaller,  on  the  average,  than  when  only  a single 
instrument  is  used. 

Applying  the  mid-value  selection  technique  directly  to  the  measurement  of  a 3-dimensional  vector  quantity  requires  a configuration 
of  nine  instruments,  with  three  instruments  aligned  along  each  input  axis.  Nearly  equivalent  redundancy  can  be  achieved  with  five  or  six 
skewed  instruments,  but  in  these  configurations  there  is  no  "middle  value"  to  select.  However,  by  reformulating  mid-value  selection  as 
an  algorithm  which  minimizes  an  appropriate  performance  index,  it  is  possible  to  extend  its  application  to  arbitrarily  skewed  arrays.  This 
extension  then  yields  bounds  on  the  system  performance  which  can  be  achieved  in  real  worst-case  situations. 

REDUNDANT  SENSOR  SYSTEMS 

Redundant  systems  are  considered  for  several  different  reasons.  One  of  these  is  simply  to  provide  a backup  so  that  an  operating 
system  may  be  "switched  out"  for  routine  maintenance  or  repair;  this  practice  is  common  among  power  generating  systems,  and  is 
found  in  commercial  aircraft  navigation  when  an  extra  inertial  measurement  unit  is  carried  aboard  in  a non-operational  spares  rack.  Of 
more  importance  for  the  purpose  of  improving  aircraft  flight  integrity  are  uses  of  redundancy  which  improve  operational  reliability  (in 
the  sense  of  increasing  the  probability  of  mission  success)  by  allowing  the  system  to  withstand  failures  when  immediate  repairs  cannot 
be  made.  The  significant  dir'inction  between  these  two  classes  of  redundant  systems  is  that  the  latter  must  have  the  ability  to  detect  and 
isolate  their  own  failures,  and  must  be  capable  of  meeting  their  operational  performance  requirements  even  after  such  failures  have 
occurred.  As  will  be  seen,  it  is  this  latter  consideration  which  most  strongly  influences  system  design. 

It  should  be  emphasized  that  great  improvement  in  system  reliability  is  gained  by  the  use  of  even  a modest  degree  of  sensor  redun- 
dancy. For  example,  the  probability  of  failure  of  a three-dimensional  measuring  system  with  three  single-axis  sensors  is  about  three  times 
the  failure  probability  of  a single  sensor  (since  the  probabilities  are  much  less  than  one).  For  a five  sensor  system  which  will  still  meet  its 
requirements  after  one  instrument  has  failed,  the  probability  of  system  (mission)  failure  is  reduced  to  about  10  times  the  square  of  the 
single  instrument  failure  probability.  Thus  for  an  instrument  failure  probability  of  10  ’,  the  system  failure  probability  is  only  about  10"’, 
and  this  probability  will  be  reduced  by  an  order  of  magnitude  for  each  additional  redundant  instrument 

One  reason  often  advanced  for  considering  redundant  sensors  is  that  improved  system  performance  can  be  achieved  because  of  the 
averaging  effect  of  having  several  independent  measurements.  Unfortunately,  substantial  improvement  occurs  only  when  a very  large 
number  of  instruments  is  used.  For  example,  when  a vector  quantity  (e.g.,  acceleration  or  angular  rate)  is  measured  with  three  orthogonal 
single  axis  instruments,  the  error  in  estimating  each  vector  component  is  simply  the  error  of  the  respective  instrument.  When  four  such 
instruments  are  used  in  an  optimal  geometric  configuration,  the  error  in  estimating  each  component  is  reduced  to  about  80%  of  the  instru 
ment  errors.  To  cut  the  estimation  error  in  half  requires  12  instruments  in  an  optimal  array,  and  to  reduce  it  by  an  order  of  magnitude 
would  require  about  3(X)  instruments. 

Thus,  as  a practical  matter,  the  use  of  redundant  sensors  to  obtain  improved  system  performance  is  seldom  economically  justifiable. 
In  fact,  as  will  be  shown  below,  where  redundancy  is  used  to  provide  complete  failure  tolerance,  the  over  all  system  performance  will 
necessarily  be  less  than  that  which  would  be  specified  for  a non-redundant  system  where  such  failure  tolerance  is  not  required. 
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SYSTEM  CONFIGURATIONS 

We  assume  for  simplicity  that  all  systems  under  consideration  consist  of  single-degree-of  freedom  measuring  instruments.  That  is, 
each  instrument  measures  the  projection  of  the  input  vector  along  a single  predetermined  instrument  "input"  axis.  In  practice,  such 
instruments  will  be  accelerometers,  gyros,  or  closely  related  sensors,  together  with  their  functionally  inseparable  electronics  (such  as 
torque  rebalance  servos). 


There  is  no  essential  loss  in  generality  caused  by  ignoring  more  complex  sensors,  like  two-axis  accelerometers  and  two-degree-of 
freedom  gyros.  The  analysis  of  redundant  systems  composed  of  such  instruments  is  similar  provided  that  1)  allowance  is  made  for  correla- 
tion between  the  measurement  errors  found  in  separate  axes  of  the  same  instrument,  and  2)  consideration  is  given  to  "ambiguous"  failure 
modes  whereby  detection  of  a failure  on  one  measurement  axis  may  or  may  not  signal  a failure  of  the  instrument  as  a whole. 

, , (11,(21,(31,(41,(51 

Many  different  configurations  (geometric  arrangements)  of  sensors  have  been  proposed  and  analyzed  in  the  literature 
However,  apart  from  the  total  number  of  instruments  used  and  the  requirement  that  instrument  axes  be  "reasonably"  distributed,  the 
actual  configuration  has  little  effect  upon  the  redundancy  management  approach. 

The  imprecise  phrase  "reasonably"  distributed  is  needed  to  avoid  a plethora  of  special  considerations  and  qualifications.  In  order 
to  measure  a thrae-dimensional  quantity,  at  least  three  distinct  measurement  axes  are  required,  not  all  lying  in  the  same  plane.  It  is  not 
necessary  that  they  be  orthogonal,  but  as  a practical  matter  it  is  desirable  that  they  be  roughly  so,  in  order  not  to  distort  badly  inputs 
lying  in  special  directions  (widely  skewed  arrays  have  been  used  to  desensitize  the  instruments  to  certain  directions).  When  redundant 
instruments  are  used,  it  is  normally  desirable  that  no  resulting  set  of  three  should  be  coplaner  so  that  each  instrument  can  provide  redun- 
dancy to  all  the  others;  to  minimize  distortion,  each  set  of  three  should  approach  as  far  as  possible  an  orthogonal  array.  On  the  other 
hand,  an  often-used  configuration  has  multiple  instruments  colinearly  arranged  along  the  three  orthogonal  directions.  Such  an  array  is 
far  from  efficient,  but  it  can  be  included  in  the  general  theory  by  adding  sufficient  qualifications.  However,  for  simplicity  here,  we  always 
assume  for  the  redundant  arrays  under  consideration,  that  all  sets  of  three  instruments  are  non  coplaner. 


Most  efficient  arrangements  fall  in  one  of  two  categories:  those  with  an  odd  number  of  instruments  uniformly  distributed  around  a 
cone,  and  those  with  an  even  number  of  instruments,  one  of  which  lies  along  the  central  axis  of  the  cone.  The  cone  angle  may  be  selected 
to  equalize  the  estimation  error  in  all  directions;  for  n instruments  with  equal  variances,  this  angle  for  the  first  category  is  found  to  be  54.74 
degrees  (the  same  as  for  three  orthogonal  instruments);  for  the  second  category,  it  is  found  to  be  that  angle  the  square  of  whose  cosine  is 
(n-3)/(3n-3).‘^’ 


Other  arrangements  will  have  somewhat  poorer  over  all  performance.  In  practice,  other  engineering  considerations  usually  deter- 
mine which  configuration  should  be  used.  For  example,  with  five  instruments  arranged  on  the  54.74  degree  cone,  the  error  in  estimating  a 
component  of  the  input  from  all  five  measurements  will  be  about  0.77o,  or  about  0.95o  if  only  four  of  the  measurements  are  used.  When 
the  cone  angle  is  opened  up  to  63.43  degrees  (the  dodecahedron),  these  errors  will  be  increased  to  about  0.82o  and  I.Oo,  respectively. 
However,  this  latter  arrangement  turns  out  to  allow  a much  simpler  physical  mounting  structure,  and  this  consideration  will  probably  lead 
to  its  choice  in  practice. 

Certain  configurations  are  especially  convenient  for  analytical  purposes.  Among  the  more  popular  of  these  is  the  six  instrument  array 
with  the  input  axes  normal  to  the  faces  of  a regular  dodecahedron.  Because  of  the  extent  to  which  this  configuration  has  been  treated  in 
the  literature,  a number  of  misconceptions  are  often  encountered  when  discussing  alternatives.  One  such  misconception  is  the  belief  that 
it  is  the  special  geometry  of  the  dodecahedron  which  yields  well-defined  parity  equations  and  simple  algorithms  for  redundancy  manage- 
ment. In  fact,  all  configurations  have  parity  equations,  and  except  in  very  special  circumstances,  impose  approximately  the  same  computa- 
tional complexity. 

If  h is  a unit  vector  in  the  direction  of  the  input  axis  of  a given  instrument,  then  the  output  m in  response  to  a (vector)  input  x will  be 
m = h^x-»-e,  with  It  = 

where  c is  the  error  in  the  measurement.  For  convenience,  the  constant  of  proportionality  (scale  factor)  is  neglected. 
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When  there  are  n instruments  in  the  system,  there  will  be  n such  equations,  so  we  may  write 
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The  matrix  H is  called  the  measurement  matrix,  and  the  rows  of  H will  be  referred  to  as  the  measurement  axes. 


We  assume  without  further  comment  throughout  the  following  that  the  rank  of  H is  equal  to  the  dimension  (normally  3)  of  the 
measured  vector.  It  cannot  be  more,  and  if  it  were  less  there  would  be  a component  of  the  measured  vector  which  would  not  affect 
the  values  of  the  measurements.  Those  configurations  in  which  this  might  occur  are  excluded  by  the  "reasonableness"  criterion  discussed 

above. 


As  an  example,  for  a "pentad"  array  formed  from  five  instruments  with  input  axes  arranged  normal  to  five  of  the  six  faces  of  a 
regular  dodecahedron,  we  can  write  (depending  on  the  labeling  of  the  instruments) 
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We  will  refer  to  this  array  in  the  examples  below,  but  it  should  be  noted  that  there  is  no  particular  significance  to  this  choice,  and 
that  it  is  made  simply  for  analytical  convenience. 


MEASUREMENT  ERRORS 

All  inertial  sensors  have  various  kinds  of  deterministic  and  non-deterministic  errors.  These  include  constant  bias  offsets,  scale  factor 
errors,  and  sensitivity  to  spurious  input  and  environmental  factors.  It  is  assumed  throughout  this  paper  that  all  instruments  under  considera- 
tion have  been  calibrated  and  compensated  to  the  extent  that  they  may  be  adequately  described  as  having  unit  scale  factors  and  random 
additive  errors  with  zero  mean  and  constant  variance  o’.  When  multiple  instruments  are  considered,  they  are  all  assumed  to  have  the  same 
variance.  The  (important)  case  of  scale  factor  errors  has  been  ignored  to  simplify  the  discussion. 

Except  for  the  assumption  that  errors  in  separate  instruments  are  statistically  independent,  no  special  assumptions  are  made  about  the 
distribution  of  the  instrument  errors.  It  is  the  authors'  belief  that  seldom  (if  ever)  in  practice  is  sufficient  data  on  the  distribution  of  real 
instrument  errors  available  to  justify  the  use  of  algorithms  based  on  higher  order  statistics.  This  is  especially  true  in  those  high  reliability 
cases  where  redundancy  management  techniques  are  finding  most  application.  The  instruments  which  are  selected  are  those  which  have 
been  shown  to  have  had  very  few  failures,  so  that  there  is  little  in  the  way  of  failure  data  on  which  to  base  statistics. 


FAILURES.  FAILURE  DETECTION,  AND  REDUNDANCY  MANAGEMENT 

The  question  of  what  constitutes  a "failed"  instrument  involves  certain  (phenomenological)  subtleties.  These  are  reflected  in  the 
fact  that  situations  exist,  such  as  when  the  quantity  being  measured  is  zero,  in  which  the  behavior  of  a perfectly  functioning  instrument 
is  indistinguishable  from  the  behavior  of  one  which  is  not  working  at  all.  In  practice  instruments  often  fail  by  small  degrees  so  that  the 
indicated  measurement  becomes  contaminated  with  nearly  unobservable  error  For  operational  purposes  we  can  define  an  instrument 
as  failed  when  it  is  contributing  measurement  errors  sufficiently  large  as  to  jeopardize  the  system  mission 

In  an  engineering  sense  an  instrument  is  "good"  only  if  it  is  somehow  c^rtle  of  measuring  with  the  required  degree  of  accuracy 
all  of  those  inputs  to  which  it  might  be  subjected.  The  attempt  to  measure  this  capability  in  practice  results  in  the  inclusion  of  built  in 
test  equipment  (BITE),  and  the  use  of  special  detectors  and  circuitry  to  continuously  momtcx  physical  parameters  such  as  supply  voltage, 
case  temperature,  wheel  sync,  and  torquer  resistance  There  are  two  serious  problems  with  this  hardware  atiproach  apart  from  the 
fact  that  the  available  set  of  useful  parameters  is  quite  limited.  The  first  is  that  an  out  of  tolerance  cortdition  on  one  of  the  monitored 


i 
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parameters  is  at  best  only  an  implication  of  possible  instrument  failure.  It  is  quite  frequently  found  that  an  instrument  which  is  not 
working  correctly  is  working  well  enough  for  the  mission  to  succeed.  The  second  problem  is  that  all  additional  monitoring  circuitry  brings 
with  it  the  possibility  of  additional  failure  modes,  and  a new  requirement  to  monitor  the  failures  in  the  failure  monitors.  The  analysis 
of  such  systems  often  leads  to  a condition  which  has  been  termed  syllogistic  instability:  "this  circuit  is  good  unless  this  voltage  drops, 
which  can  only  happen  if  this  other  circuit  is  bad,  unless...".  The  chain  of  logic  involves  a kind  of  feedback,  and  it  may  be  very  difficult 
or  impossible  to  be  sure  the  system  will  work  as  intended  under  all  conditions.  It  is  perfectly  possible  to  design  a "fail  safe"  system 
which  is  permanently  disabled  by  its  own  logic. 

Redundant  systems  provide  a way  out  of  this  difficulty  by  allowing  decisions  regarding  failed  instruments  to  be  based  on  the  outputs 
of  the  instruments  themselves:  all  other  (possibly  erroneous)  failure  indications  can  therefore  be  ignored,  at  least  until  the  level  of 
redundancy  has  been  reduced  by  realized  failures  to  where  the  other  information  must  be  used  as  a "last  resort".  This  basic  approach  to 
failure  detection  has  been  called  the  "algorithmic  covering  technique".  Clearly  what  it  imposes  on  the  redundancy  management 
algorithm  is  the  requirement  that  it  be  able  to  isolate  or  tolerate  any  failure  large  enough  to  jeopardize  the  mission. 

To  isolate  or  tolerate  a single  failure  requires  that  the  number  of  instruments  must  exceed  the  dimension  of  the  measured  quantity 
by  at  least  two.  Thus  five  instruments  are  required  to  tolerate  a single  failure  in  3-dimensional  space.  If  the  minimal  number  of  instru 
ments  required  to  tolerate  one  failure  is  used,  the  reliability  of  the  system  is  just  the  probability  that  two  instruments  don't  fail.  If  more 
than  the  minimal  number  of  instruments  is  used,  then  the  system  may  be  able  to  tolerate  additional  failures.  However,  a difficulty  is 
introduced  by  the  need  to  consider  the  possibility  of  simultaneous  failures  (simultaneous  in  the  sense  that  one  occurs  before  another  has 
been  completely  isolated).  To  see  this,  consider  the  case  of  six  instruments  in  3-dimensional  space.  When  a single  failure  occurs  and  can 
be  1*  ilated,  the  remaining  configuration  of  five  instruments  possesses  sufficient  inherent  redundancy  to  isolate  a second  failure.  But  if 
two  failures  occurred  simultaneously,  we  might  not  be  able  to  say  which  set  of  four  was  still  good  unless  we  knew  which  sets  of  five  con 
tained  at  most  one  failure.  In  fact  it  has  been  shown  that  to  be  sure  of  isolating  k failures  among ,£  instruments  in  n-dimensional  space, 
we  must  have.;?  > 2k  -t  n.  Thus  to  provide  certain  failure  tolerance  of  two  failures  under  worst-case  conditions  in  3-dimensional  space 
requires  the  use  of  seven  sensors.  To  avoid  the  complications  inherent  in  multiple  failure  tolerance,  we  will  discuss  below  only  systems 
designed  to  tolerate  a single  failure. 

In  accordance  with  the  algorithmic  covering  technique,  failure  detection  must  depend  only  on  the  disagreement  or  "inconsistency" 
between  redundant  measurements.  As  will  be  shown,  all  information  about  this  disagreement  resides  in  a set  of  linear  relationships  called 
parity  equations.  If  the  number  of  independent  parity  equations  were  equal  to  the  number  of  measurements,  then  the  equations  could  be 
solved  to  yield  the  measurement  errors.  The  fundamental  difficulty  in  identifying  failed  instruments  arises  from  the  fact  that  there  are 
necessarily  fewer  independent  parity  equations  than  instruments.  This  limitation  will  be  explored  in  what  follows. 

14).  171,  18),  191, 

A number  of  interesting  redundancy  management  algorithms  have  been  proposed  and  analyzed  in  the  open  literature. 

1101.  mi.  iizi  1131 

Some  of  these  may  offer  significantly  better  estimates  under  some  conditions.  A real  difficulty  is  that  the  best  estimate" 

under  some  conditions  is  not  usually  the  "best  strategy"  under  most  conditions.  An  example  will  help  to  clarify  this  distinction. 

Suppose  that  the  same  component  of  angular  rate  is  measured  (in  °/hr.,  for  example)  with  three  different  gyros,  each  having  the 
same  standard  error  of  0.1,  and  giving  1.4,  1 .2,  and  1.9  respectively.  If  we  think  that  all  three  gyros  are  good,  then  our  best  estimate  of 
the  rate  is  probably  1 .5,  their  arithmetic  mean.  The  standard  error  of  this  estimate  is  about  0.06.  On  the  other  hand,  if  we  think  that 
the  third  instrument  is  so  far  out  that  its  value  ought  to  be  disregarded,  then  our  best  estimate  is  probably  1.3,  the  arithmetic  mean  of 
the  other  two.  The  standard  error  of  this  estimate  is  about  0.07. 


Unfortunately,  if  we  assume  the  first  case  when  the  second  is  true,  we  shall  be  off  from  the  best  estimate  by  0.2,  and  if  we  assume 
the  second  case  when  the  first  is  true  we  shall  also  be  off  by  0.2.  We  will  be  somewhat  safer  if  we  choose  the  middle  value  1 .4,  since  we 
will  then  be  off  by  0.1  at  most,  regardless  of  which  case  is  actually  true.  However,  the  standard  error  in  this  estimate  (the  median  of  3) 
is  about  0.08,  which  is  somewhat  larger  than  either  of  the  other  two.  Restated,  we  choose  the  mid-value  not  because  it  is  the  best 
estimate  - it  is  in  fact  known  not  to  be  so  good  — but  rather  because  it  differs  from  the  true  value  by  the  least  amount,  in  the  worst  case, 
when  a (single)  instrument  has  failed.  It  is  the  precise  restatement  of  the  mid  value  selection  technique  in  this  form  which  wc  will 
generalize  below. 
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PARITY  EQUATIONS 

If  a pair  of  measurements  m^ , m,  are  made  of  the  same  physical  quantity  then,  in  the  absence  of  error 

m^  - mj  » 0 

Such  an  equation  is  called  a Parity  Equation,  since  if  asserts  the  equality  of  the  two  measurements.  When  the  measurements  contain 
errors  the  parity  equation  becomes 


or,  equivalently 


(m^  - (mj  - = 0 


- e,  =7, 


The  value  ij  may  be  considered  a measure  of  the  lack  of  agreement  or  "inconsistency"  in  the  measurements.  Note  that  it  is  not  directly 
a measure  of  the  error,  since  there  are  many  combinations  of  measurement  errors  for  which  rj  = 0.  When  a third  measurement  m^  is 
considered,  there  are  three  possible  parity  equations  which  can  be  formed 


m,  = e,  - e, 
m,  = e,  - e. 


It  is  important  to  notice  that  the  third  equation  is  a linear  combination  of  the  other  two:  in  a strict  sense  there  are  only  two  degrees  of 
inconsistency  between  the  three  measurements.  Consideration  of  a fourth  measurement  will  introduce  three  more  parity  equations,  but 
only  one  more  degree  of  inconsistency. 

When  a 3-dimensional  vector  quantity  x is  measured  with  a redundant  array  of  instruments,  the  component  of  x along  any  one 
instrument  axis  can  be  wrinen  in  terms  of  its  components  along  any  other  three.  For  example,  consider  four  measurements,  m^ , m^, 
mj  and  m^.  We  can  write 


= Qx, 


Then 


with  Q = 


II  I J 13 


^2  I " 22  h. 


3 1 32  33 


= hjx=  hjo' 


Thus,  there  is  an  equation  of  the  form 


X|  m,  rOj  + Xj  m^  + X,  m^  = 0 


which  is  uniquely  determined  (up  to  a constant  multiplier)  by  the  geometric  orientation  of  the  instruments.  When  measurement  errors 
are  included  we  will  have 


X,(m,-f,)  + X,(m,-f,)  +X,(m,-e,)  + X,lm^-6j  = 0 


or,  equivalently 


X|  m|  + Xfmj  + X^m^  + X^m^  = g 

We  will  call  such  an  equation  a parity  equation  by  analogy  with  the  one  dimensional  case.  When  the  measurements  are  without  error 
T|  = 0.  We  call  T)  the  inconsistency  in  the  measurements  which,  though  made  along  different  directions  are  not  physically  independent. 
Note,  however,  that  as  explained  earlier,  tj  is  not  directly  a measure  of  the  error,  since  there  are  many  combinations  of  measurement 
errors  for  which  it  also  vanishes. 
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Now,  it  there  are  in  fact  i instruments  3),  then  there  are  = 


i.  L 


distinct  combinations  of  four  measurements 


4!(,^  -4)! 

which  can  be  used  to  form  different  parity  equations.  Thus  for  five-instrument  arrays,  there  are  five  distinct  parity  equations  (that  is, 
distinct  up  to  an  arbitrary  constant  multiplier),  and  for  six-instrument  arrays  there  are  15  distinct  parity  equations.  It  is  important  to 
note  that  not  all  of  the  parity  equations  are  independent.  In  fact,  the  number  of  independent  equations  q is  given  by 


q - n 

Where  >^is  the  number  of  measurements  and  n is  the  dimension  of  the  meesured  vector.  Thus  tor  five  skewed  instruments  in  3-dimension- 
al space  there  are  only  two  independent  parity  equations,  and  for  six  skewed  instruments  only  three.  The  consequences  of  this  will  be 
explored  in  detail  below. 


A LOWER  BOUND  ON  THE  LARGEST  ERROR 

Since  the  coefficients  A in  each  of  the  parity  equations  are  determined  only  up  to  an  arbitrary  multiplier,  we  may  assume  without 
loss  of  generality  that  they  have  been  normalized  so  that  the  sum  of  the  absolute  values  of  the  coefficients  in  each  equation  is  unity. 

The  significance  of  this  particular  normalization  is  that  we  can  then  say  of  the  measurements  in  any  particular  parity  equation  that  there 
is  at  least  one  error  whose  magnitude  is  as  large  as  the  magnitude  of  the  "inconsistency"  jj.  This  fact  is  the  basis  of  the  Minimax  failure 
detection  method  of  Potter  and  Deckert. 


PARITY  SPACE 

Since  there  are  only  q =Z - n independent  relations  among  the/  parity  equations,  it  is  possible  to  express  the  inconsistency  values 
entirely  In  terms  of  q independent  variables,  say  P, , Pj , •■  ,Pq.  The  vector  p will  be  called  a parity  vector,  and  the  q dimensional  space 
of  all  parity  vectors  will  be  called  parity  space.  All  information  contained  in  the  Cff parity  equations  is  available  from  the  q components 
of  the  parity  vector.  Because  the  dimension  of  the  parity  vector  is  less  than  the  dimension  of  the  measurement  vector  m,  considerable 
simplification  and  insight  is  provided  by  expressing  failure  detection  algorithms  in  terms  of  the  parity  vector. 

Now,  the  parity  equations  are  special  linear  combinations  of  the  measurements  whose  values  depend  only  on  the  measurement 
errors.  These  values  are  independent  of  the  value  of  the  measured  vector.  As  defined  above,  each  parity  equation  involves  only  four 
measurements,  but  it  is  clear  that  we  could  have  defined  each  equation  as  a linear  combination  of  all  components  of  the  measurement 
vector  m,  so  that  each  equation  could  be  written  in  the  form 

T 

r)  = y m 

where  y^  is  an  appropriately  defined^—  dimensional  vector  (some  of  whose  components  are  zero). 

To  slightly  generalize  the  concept  behind  the  parity  equations,  let  us  define  a parity  function  as  any  (possibly  non  linear)  function 

of  the  measurement  vector  whose  value  is  independent  of  the  measured  vector.  Thus  the  parity  equations  defined  above  are  linear  parity 

•*«» 

functions  in  this  sense. 

For  an  arbitrary  (linear)  parity  function  we  have 

yjm  = yj  (Hx  + f) 

Since  by  definition  this  equation  must  be  independent  of  x it  follows  that  y^H  = 0 or.  equivalently  H^y  = 0.  In  other  words,  the  most  general 
(linear)  parity  functions  are  simply  the  inner  products  of  the  measurement  vector  m with  the  null  vectors  of  the  matrix  H 

To  construct  a parity  vector  p , first  let  V be  a q x matrix  whose  rows  are  any  q linearly  independent  null  vectors  of  Because 
the  X n matrix  H has  rank  n,  the  dimension  of  the  null  space  of  is  q = / - n and  such  a set  of  vectors  can  be  found.  For  example,  the 
rows  can  be  taken  simply  as  the  coefficients  of  the  first  q parity  equations.  Then  by  construction  VH  = 0,  and  vve  may  take  as  a parity  vector 
the  q dimensional  vector  p = Vm . In  order  to  show  that  p is  a parity  vector  in  the  sense  defined  above,  we  must  show  that  the  values  of  all 
the  parity  functions  can  be  derived  from  it. 
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A 


To  do  so,  let  K be  any  nx^ left  inverse  of  H so  that  KH  = I,  and  define  x^  = Km  . Since  the  rank  of  H is  n anfi£>n,  an  infinity 
of  left  inverses  exists. 

Then  we  can  write 


Km 

k" 

.ej 

Vm 

V 

and  m can  be  obtained  by  inversion  provided  the.^  x ^ matrix  A = 


is  non-singular. 


To  prove  that  A ir  non-singular,  let  a be  an  appropriately  partitioned  null  vector  of  A so  that 
A^a  = K^a,  -t  V^a^  = 0 


Then 


H^A^a  = H^K^a,  -i-  H^V^a^  = a,  = 0 


From  this  we  conclude  that  V 3^=  0 

And,  since  the  columns  of  are  linearly  independent,  a^  = 0. 

Thus,  the  only  solution  of  A^a  = 0 is  a = 0,  A^  is  non-singular,  and  hence  A is  non-singular. 
Now,  partition  A"'  in  terms  o^£  x n and  j2  matrices  B,  M so  that  A“'  = (B,  M]. 

'k' 

Then  H = A"’ AH  = [B,  M|  H 


^ [BK  -t  MV|  H = B 


since  KH  = I and  VH  = 0 by  construction. 

!!b 


Hence 


m = A 


= (H,  M] 

and  any  linear  parity  function  can  be  written 


■ H^  + Mp 


r)  = y^m  = y^Hx  + y^Mp  = y^Mp 

”b 


which  was  to  be  shown. 


BASE  VECTORS  AND  THE  LEAST  SQUARES  ESTIMATE 

It  is  convenient  at  this  point  to  introduce  the  concept  of  a base  vector.  Consider  the  vector  ^ defined  above.  We  have 
x^  = Km=  K(Hx-te]  = x-r  Ke 

The  vector  Xj^  equals  x when  there  are  no  errors,  and  approximates  it  when  the  errors  are  small.  We  call  any  such  vector  a base  vector 
for  X. 

Note  from  the  previous  section  that  the  measurement  vector,  m = HXj^  -t  Mp , is  composed  of  two  components,  one  defined  on  the 
base  vector  and  one  on  the  parity  vector.  The  first,  m|^  = be  thought  of  as  the  "consistent"  part  of  the  measurement,  since  if 

t)  is  any  parity  function  rj  (m|j)=  0.  The  second  part,  mj.  = Mp , will  be  called  the  reduced  measurement.  It  may  be  used  instead  of  the 
actual  measurement  vector  m in  evaluating  any  parity  functions,  since  t)(rn^l  = »)(m). 

The  base  vectors  for  x are  not  uniquely  determined.  In  particular,  any  solution  for  x which  is  derived  from  n of  the  i?  measurements 
is  a valid  base  vector.  The  significance  of  the  base  vectors  is  that  they  are  all  potential  candidates  to  be  used  for  the  estimate  of  x in  any 
redundancy  management  algorithm. 

One  base  vector  of  particular  importance  is  the  so-called  least  squares  estimate.  Where  more  than  the  minimum  number  of  instru- 
ments is  available,  the  "best  estimate  in  the  least  squares  sense"  is  that  value  which  minimizes  the  square  of  the  length  of  the  error  vector, 
that  is 
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= (m  - Hx)^(m  - Hx( 

Let  X be  the  solution  to  the  (normal)  equations 
h'Hx  = H m 

Then,  with  some  manipulation,  we  can  get 

e^f  = (m  - Hx)^(m  - Hx)-t(x-x)^H^H(x-x) 

Since  the  second  term  on  the  right  is  a positive  definite  quadratic  form,  it  follows  that  the  minimum  is  obtained  when  x = x'=  Km, 
where  K = Note  that  x is  a legitimate  base  vector  since  KH  = I. 

As  an  example,  consider  the  pentad  configuration  defined  above.  By  straightforward  computation  we  obtain 


x=  (H^H)-'H^m=% 


s 

-c’s 


-s  c c 0 

I 

-c’s  s(l  + cM  ~ttl +c^>  2c’ I m 


c(1  + s’ ) c(1  + s’ ) -c$’  cs’ 


2s’ 


Each  component  of  the  least  squares  estimate  is  simply  a fixed  weighted  sum  of  the  instrunwnt  outputs.  Whether  this  estimate  is  used 
in  practice  depends  on  whether  such  computation  is  considered  practical.  Note,  however,  that  in  the  strapdown  context  a computation 
of  this  form  is  usually  used  to  correct  for  scale  factor  and  alignment  errors  anyway,  so  that  use  of  the  estimate  in  practice  often  entails 
very  little  additional  computation. 


ORTHOGONAL  PARITY  SPACE 


When  the  least  squares  estimate  x is  used  as  a base  vector,  the  measurement  equation  becomes 

A ^ A 

m = m + e 

where  by  definition,  m = Hx  and  £=  m — m 

Note  that  £is  really  the  reduced  measurement  m^,  so  that£=  Mp.  Since  ^is  by  definition  the  value  of  x which  minimizes  the  length  of 
m - Hx,  it  is  evident  by  simple  geometry  that  m and  ^are  orthogonal.  Algebraically, 
m^£  = m^K^H^Im  - HKm  ) 

= m^[K^H^(l  - HK)]m  = 0 
since  K = (H^H)"' 

and  K^H^HK  = H^H  (H^H)*'  H = K^H^  = HK. 

Although  the  reduced  measurement  is  uniquely  determined  by  the  selection  of  a particular  base  vector,  the  parity  vector  is  not, 
since  we  have  required  so  far  only  that  V be  of  rank  q and  that  VH  = 0.  We  will  now  derive  sufficient  restrictions  on  V in  order  to  make 
the  parity  vector  unique,  and  we  will  do  this  in  a natural  way  which  makes  it  easy  to  determine  M.  We  note  in  passing  that  if  p=  Vm  and 
p'  = V'm  are  any  two  parity  vectors,  then 

g'  = V'l  Hx  + Mpl  « V’Mg 

From  the  definition  of  M above. 


MV= I -HK 
MVV^»(I-HK)V^ 


so,  providing  VV 


T 


is  non  singular,  we  have  in  general 


M*(|-HK)V^|VV^)'' 
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To  prove  that  VV^  is  non-singular,  we  note  that  V^x  = 0 and  VV^x  = 0 define  the  same  solutions:  V^x  = V^.  It  would  be 
VV^x  = 0:  VV^x  implies  that  x^VV^x  = ( V^xl^V^x  = 0 and  thus,  since  is  real,  that  V^x  = 0.  Therefore  and  VV^  have 
the  same  rank  q,  but  VV^  is  q x q and  thus  is  non-singular. 


When  the  least  squares  estimate  is  used  for  the  base  vector,  then  (l-HK)V^  = - HKV^  = - [VHK|^  = 0.  It  would  be 

convenient  if  we  could  also  choose  V so  that  vv"^  = I,  since  then  we  would  have  M = v"^.  But  this  can  always  be  done:  all  we  have 
required  of  V is  that  its  rows  be  q linearly  independent  null  vectors  of  If  we  choose  these  vectors  to  be  orthonormal  then  VV^  = I 
directly.  Unfortunately,  V is  still  not  completely  determined,  since  if  U = RV  and  R^R  = I,  then  U^U  = V^V.  We  may  complete 
the  specification  of  V by  requiring  that  it  be  upper  triangular  with  positive  (non-null)  diagonal  elements.  It  can  then  be  verified  (we  omit 
the  proof)  that  if  W = I - HK,  then  V is  given  by  the  following  formulas 


=W„,V  =0fori<i,v  =W  /V„  forj  = 2, 


i -1 

V?.  = W.. -1  V^.  fori  = 2,  •••,q 

" " k = 1 


i -1 


''ii  = /ki\j'^i ‘ = 2. -.q,  i = i . 1, -.i 


For  example,  when  this  process  is  applied  to  the  pentad  described  above,  we  get; 


2cs  -c^ 

0 s 


c 


-c^ 

-s 


-s' 

-c 


To  summarize,  the  above  development  defines  what  we  will  call  orthogonal  parity  space.  In  orthogonal  parity  space  we  have 


A ^ A 

m = m + e 


m=  HKm 


A v/TA 
€ * V p 


p=  Vm 


where 


K = (H^H)-'H^ 


and  where  V is  upper  triangular,  has  positive  diagonal  elements,  and  satisfies 


V V=  I -HK 


Therefore,  it  follows  that 


VV^  = KH  = I and  VH  = KV^  = 0. 


Orthogonal  parity  space  is  important  because  of  its  close  relationship  to  the  concept  of  the  "best  estimate".  It  is  particularly 
convenient  for  analytical  purposes  since  physical  symmetry  in  the  orientation  of  the  instruments  axis  tends  to  be  preserved  in  geometric 
symmetries  of  the  parity  space. 


15-10 


MEASUREMENT  AXES  IN  PARITY  SPACE 

Corresponding  to  each  of  the  measurement  axes  in  real  space  (the  physical  instrument  input  axes),  there  is  a special  direction  in 
parity  space.  If  an  error  occurred  in  only  one  of  the  measurements,  then  the  parity  vector  would  lie  along  the  corresponding  special 
direction  and  its  length  would  be  proportional  to  the  magnitude  of  the  error.  We  will  call  these  directions  the  measurement  axes  in  parity 
space.  Since  £ = Vm  = Vf,  it  is  clear  that  they  are  simply  the  directions  of  the  corresponding  column  vectors  of  V. 

When  orthogonal  parity  space  is  used  we  have  for  the  reduced  measurement,  m^  = e = V^p,  so  that  the  reduced  measurement 
corresponding  to  an  error  in  a single  measurement  axis  is  simply  the  projection  of  the  parity  vector  on  the  corresponding  measurement 
axis  in  parity  space.  However,  since  V^V  is  not  unity,  the  reduced  measurement  m^  = ?=  V Vc  corresponding  to  a single  measurement 
error  will  have  other  non-zero  components:  the  single  error  will  be  "distributed",  and  some  of  it  will  be  "absorbed"  into  the  input  estimate 
represented  by  the  base  vector.  The  very  need  to  estimate  the  input  in  the  presence  of  error  means  that  some  of  the  error  must  contaminate 
the  estimate.  In  fact,  any  set  of  errors  such  that  Ve  = 0 is  necessarily  unobservable,  whatever  estimate  is  chosen. 


AN  EXAMPLE 


In  order  to  illustrate  much  of  the  above,  we  will  apply  the  theory  to  the  case  of  three  measurements  of  a scalar  (that  is,  a vector  of 
dimension  one)  using  orthogonal  parity  space.  The  example  is  particularly  instructive  because  of  the  insight  it  provides  into  the  mid- 
value selection  technique. 


m,  = X e, 
m.  = X -r  e. 


where  m^ , m^,  m^  are  the  measurements  of  x,  and  e,,  €j,  are  the  errors. 


m = Hx  -r  e with  H = 1 


The  least  squares  estimate  for  the  base  vector  (the  scalar  xj,)  is 


x = Km  with  K = (h''’h)  ' h''’=  1/3  11,  1,  1] 


so  that  X = 1/3  (m^  -r  mj -r  m^) 

Thus  the  estimate  (the  base  vector)  is  simply  the  average  of  the  measurements,  as  we  would  expect.  The  matrix  V^V  = I - HK  gives 


V^V  = 1/3  - 1 


2 - 1 - 1 


- 1 - 1 


fm 

- 1/yT 

- 

0 

Vj2 

-Vj2_ 

Using  the  square  root  formulas  from  above  gives 


It  may  be  verified  directly  that  VV^  = I and  VH  = 0.  The  columns  of  V are  vectors  of  length  j2/3  which  lie  along  the  measurement  axes 
in  parity  space.  Plotting  these  vectors  gives  a measurement  axis  every  sixty  degrees  (one  is  coincident  with  the  P,  axis). 

♦ 

Pt 

(-1//6,  i//i).  J ^ 

A / - 3 


\ 

( /m.  0) 


p, 


{-vje.-ujl)  /a 


15-11 


The  symmetry  shown  by  the  measurement  axes  in  parity  space  reflects  the  symmetry  of  the  H matrix  (all  measurements  have  equal  weight). 
The  components  of  the  parity  vector,  as  given  by  Vm,  are 

Pi  = (1/  /6)  (2m|  - m^  - m^) 

Pj  = (I//2)  (m,  - m^) 

The  components  of  the  parity  vector,  as  given  by  £=  Vm  = are 

p,  = [Mfl)  (2f,  - e,  - Cj) 
p,  = (1/y'2Me, -£3) 

We  emphasize  again  that  the  parity  vector  is  a function  only  of  the  measurement  errors,  and  not  the  actual  input. 

The  parity  equations,  which  is  to  say,  the  linear  relations  between  m, , m^,  m3  which  do  not  involve  x,  are  simply  whatever  linear 
relations  can  be  formed  between  the  equations 

2m,  - m3  - m3  =2e,  - f j - ^3 
'"2  ~ ■ ^3 

The  most  general  linear  parity  function  for  this  case  has  the  form 

7)  (m)  = am,+  bmj  + cm3 

From  the  requirement  that  ij  (m)  be  independent  of : , we  conclude  that 

a -r  b + c = 0 

When  we  express  the  parity  function  in  term.s  of  p, , Pj  we  get 

17  (m) 

Thus,  in  terms  of  failure  detection,  we  may  now  say:  If  the  parity  vector  corresponding  to  a particular  set  of  three  measurements 
were  very  large  (that  is,  relative  to  the  magnitude  of  "normally  expected"  errors),  and  if  it  iay  very  nearly  along  one  of  the  measurement 
axes  in  parity  space,  then  we  could  presume  that  the  instrument  corresponding  to  that  axis  had  failed.  We  see  clearly  from  the  diagram 
how  a combination  of  errors  on  the  other  two  axes  could  have  caused  the  same  result,  but  in  that  case  we  would  have  two  very  large 
failures,  3 "disaster"  against  which  we  are  powerless  to  contend  by  any  means. 

If  the  parity  vector  fell  somewhere  in  between  two  axes,  the  situation  would  not  be  so  clear.  We  might  weight  the  measurements  if 
we  had  some  (a  priori)  knowledge  about  the  error  statistics.  But  in  any  event,  we  would  give  special  credence  to  the  axis  which  is  "most 
norma\"  to  the  parity  vector,  since  its  errors  must  be  contributing  least  to  the  inconsistency.  Or  we  could  avoid  the  question  of  statistics 
entirely  and  use  only  that  axis  all  the  time.  This  latter  is  in  effect  the  mid  value  selection,  and  is  summarized  as  follows. 


= -t  (b  + c)  p, 


(b  - c)  p. 


I 

Instrument  3 

/ 

/ 

Regfon  1 / 

/ 

/ 


R«qior>  3 


Region  2 


Region  2 


/ 


/ 

/ 

/ 


\ / 


Instrument  1 


/ 


\ 


Region  3 


\ 


Region  1 \ 


\ Instrument  2 


\ 
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Region 

Instrument 

Estimate 

1 

1 

m^  = X + ( 1/2/3)  P| 

2 

2 

mj  =x-  (1//6)  p. 

+ 0//2)  p. 

3 

3 

m^  = X-  (1//e)  p, 

- (I//2)  P, 

The  first  column  gives  the  region  in  parity  space  in  which  the  parity  vector  is  found  to  lie,  as  identified  by  the  figure  above.  The 
second  column  shows  the  instrument  whose  output  would  be  used.  Note  that  in  each  case  it  is  the  instrument  whose  measurement  axis 
in  parity  space  is  "most  normal"  to  the  region  where  the  parity  vector  lies.  The  third  column  shows  the  estimate  expressed  in  terms  of 
the  base  vector  (the  mean  of  all  three  measurements!  and  the  (algorithmic)  correction  which  must  be  made  to  account  for  the  possibility 
of  instrument  failures.  On  the  average,  the  "best  estimate"  will  be  off  by  the  RMS  value  of  this  correction.  This  shows  directly  how  the 
over  all  system  performance  is  degraded  by  the  inclusion  of  failure  tolerance. 

STATISTICAL  FORMULAS 


It  is  interesting  to  note  the  simple  form  taken  by  statistical  formulas  in  orthogonal  parity  space  when  the  measurement  errors  are 
assumed  to  be  independent,  identically-distributed  gaussian  random  variables.  Let  o denote  the  standard  deviation  of  a single  measurement 
error.  Then  the  likelihood  function  (i.e.,  the  probability  density  for  the  observed  measurement  m,  given  a particular  input  x,)  is  given 

by 

P(m|x)  = ^ exp  (-  ^ exp  ((x  -x  I^H^H  (x-v)| 


The  maximum  likelihood  estimate  of  the  measured  vector  x is  the  base  vector  x.  , and  the  likelihood  P at  this  value  of  x is  given 


by 


exp(- 


itiax  /J . t!  \ 

(2rr)-^'’o'^  V 2o^ 

The  probability  density  function  for  the  parity  vector  itself  may  also  be  calculated.  This  probability  density  is  given  by  the  formula 


P(p) 


(2rr)‘’''«‘’ 


exp 


V 2o^  y 


which  is  the  same  function  except  for  a multiplicative  factor. 


This  formula  implies  that  the  components  of  the  parity  vector  are  independent  identically  distributed  gaussian  random  variables 
with  standard  deviation  o.  Thus,  the  components  of  the  parity  vector  have  the  same  probability  distributions  as  the  measurement  errors. 

The  square  of  the  length  of  the  parity  vector  is  distributed  according  to  the  chi  squared  probability  distribution  with  q degrees 
of  freedom."*'  In  particular,  if  q = 2,  the  probability  that  the  length  of  the  parity  vector  is  less  than  a fixed  value,  x.  is  9'ven  by 

P^  l|  p1<x1  = 1 -exp  (-x^/2a^l 

SO  that  with  ninety  percent  probability 

I p I < 2.14o 


eSTIMATION  ALGORITHMS 


By  an  estimation  algorithm  we  mean  simply  a computational  process  to  estimate  a physical  quantity  from  noisy  and  possibly 
faulty  measurements.  The  process  may  be  explicitly  or  implicitly  defined.  Thus  the  "best  estimate  from  a set  of  good  measurements 
was  defined  implicitly  above  as  that  x which  gives  the  minimum  of  a particular  function  of  the  measuiements,  namely  the  length  of 
the  error  vector  | m - Hx  | . This  then  led  to  the  explicit  analytical  formula. 


x = (H^H)'H^m. 
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In  the  following  we  shall  formulate  an  estimation  algorithm  for  the  "safest  estimate  in  the  presence  of  possible  failures"  as 
the  process  of  finding  that  x which  gives  the  minimum  of  some  performance  index  L which  is  a measure  of  the  worst  error  we  might 
make  (in  estimating  x)  when  a failure  occurs.  It  will  be  shown  that  this  algorithm  is  equivalent  to  the  mid-value  selection  in  the  case 
of  three  measurements  of  a sca'ar  quantity. 

Unfortunately,  it  does  not  appear  possible  to  express  the  resulting  algorithm  as  an  explicit  analytical  formula  in  the  most  general 
case.  However,  it  will  be  shown  how  parity  space  may  be  used  to  simplify  the  implementation  of  the  computational  process. 

A WORST-CASE  PERFORMANCE  INDEX 

For  the  moment,  in  order  to  simplify  notation,  let  x be  any  estimate  that  we  might  make  of  the  "input"  x . What  we  need  is 
some  performance  index  L which,  as  a function  of  the  measurement  m and  the  estimate  x , is  a measure  of  the  "worst  error"  that 
might  have  been  made  by  the  estimate.  What  we  desire  is  that  this  index  not  depend  on  any  specific  (a  priori)  knowledge  about  the 
statistics  of  the  measurement  errors. 

Now,  it  would  appear  that  the  performance  index  should  be  a function  of  the  ratio  between  some  measure  of  the  estimation  error 
and  some  measure  of  the  input  error.  It  is  reasonable  to  require  that  they  be  related,  and  by  using  their  ratio  we  can  ensure  that  the 
estimation  error  will  not  be  arbitrarily  large  when  all  of  the  measurement  errors  are  small.  Of  course,  some  other  relationship  could 
be  used,  but  parsimony  (and  the  advantage  of  hindsight)  inclines  us  toward  this  choice. 

As  a measure  of  the  estimation  error,  it  is  most  natural  to  use  the  length  | x - x|  of  the  error  itself.  However,  there  are  a number 
of  reasonable  choices  which  might  be  taken  as  a measure  of  the  error  in  the  measurement,  such  as  the  maximum  value  among  the 
individual  errors,  and  the  RSS  of  the  individual  errors  (that  is,the  length  of  the  error  vector  (m  - Hx  I ).  We  shall  use  the  latter  for 
explicitness  here,  although  it  is  possible  to  extend  the  theory  by  requiring  only  that  the  measures  chosen  have  certain  abstract  properties. 

Given  these  preliminaries,  it  is  reasonable  to  propose  for  the  performance  index  which  is  to  be  minimized, 

A 

L(m,  X ) = sup.  •; — 

“ " ^ G}  “ ^5 

where  "sup"  means  the  least  upper  bound  over  all  x . For  the  moment  we  ignore  the  fact  that  the  denominator  may  vanish  for  some 
measurements  m and  x This  wilt  happen  whenever  there  is  no  error  in  the  measurement.  We  will  treat  this  important  case  in  detail  in 
a separate  section  below. 

Informally,  we  may  interpret  the  performance  index  as  follows. 

Given  a measurement  m and  an  estimate  x , we  may  suppose  that  "nature  conspires"  to  have  made  the  actual  input  x such  that  we 
have  made  the  worst  possible  error  in  our  estimate,  relative  of  course  to  the  actual  error  in  the  measurement  itself. 

If  we  now  require  of  our  estimate  x that  it  be  chosen  for  every  m , so  that  ==  L(m.  x ) be  a minimum,  then  we  will  have 

|x-?I<L(,|[D-Hx| 

and  will  have  bounded  the  error  in  our  estimate.  Note  that  here  x is  a value  which  "realizes"  the  minimum  L^.  (We  will  avoid  introducing 
a special  symbol  to  distinguish  it  from  any  estimate  when  there  is  no  danger  of  confusion.) 

Assuming  that  it  is  always  possible,  at  least  to  any  required  degree  of  accuracy,  to  compute  a value  of  x which  minimizes  L(m,  x), 
then  we  have  implicitly  defined  the  desired  estimation  algorithm.  The  set  of  values  x which,  corresponding  to  a given  m,  realize  the 
minimum  = L(m,  x ) define  a mapping  x = F(m ).  If  it  could  be  analytically  written  out,  it  would  constitute  the  explicit  algorithm 
discussed  above. 


J, 
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AN  OPTIMAL  ESTIMATE  IN  THE  ABSENCE  OF  FAILURES 

In  order  to  gain  some  understanding  of  the  performance  index  L,  let  us  first  attempt  to  minimize  it  on  the  assumption  that  there 
are  no  failures.  We  will  use  orthogonal  parity  space  to  simplify  the  calculations. 


Let 


and  let  us  introduce  new  variables  defined  by 


J(x.x)  = 


|m  - Hx  I 


X = Km 
5 = Ay^+  Km 


where  K = ‘ and  A is  any  n x n orthogonal  matrix,  so  that  AA^  = I. 


Then 


L(m  , X ) = sup  J(x,x)  = sup  J(y,y) 
5 ' * Y 

To  compute  the  least  upper  bound  on  J we  can  maximize  J*. 


But 


J’  = 


lx-xl'T(x-x] 


[m  - H)^  ' [m  - Hx]  [£-  HA^]  ' [e  - HAy^j 


where  e'  = m - HKm  = V^Vm  is  the  reduced  measurement. 


Using  the  fact  that  = H^V^Vm  = 0,  we  get  easily  by  direct  expansion. 


(e  - HAy]^[cHAy]  = e’  +y^A^H^HAy  where  e’  = lf.P- 

But  because  is  an  n x n real  symmetric  matrix,  it  is  always  possible  to  choose  A so  that  A^H^HA  is  diagonal  *'®’.  Thus  if  X, 


are  the  eigenvalues  of  H H,  we  can  write: 


, £(y:  - y:)’ 

J'  = :: — 1 — (sum  on  i = 1 n) 


e’  + SXjyf 


Now  we  observe  that  the  denominator  is  independent  of  the  signs  of  the  yj,  but  that  the  numerator  will  be  larger  when  the  signs  of  the 
y|  are  taken  opposite  to  those  of  the  corresponding  y..  From  this  it  is  clear  that  whatever  value  of  y is  chosen, 

sup  J(y,  01  < sup  J(y  y) 

V ' y ■’ 


A A 

But  by  the  definition  of  y,  this  is  equivalent  to  saying  that  whatever  the  value  of  x^ 

L(m,  Km)  < L(m,  x) 

We  can  realize  the  minimum  by  taking  x = Km,  which  is  to  say,  the  performance  index  L(m,^)  is  minimized  for  the  particular  case  under 
consideration  by  the  least  squares  estimate. 


The  minimum  L is  then 
o 


Lo  * sup  j(y,0)  = sup 

V ■ y 


f’  + £X|y? 


(sum  on  i = 1 n) 


and  it  can  be  shown  (we  will  omit  the  proof)  that  this  function  has  a maximum  L = , — where  X_;„  * min  |X, , X. X_]. 

0 1 ^ min  I I n 

V ''min 
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A. 


With  the  above  definition,  we  may  now  define  the  fault-tolerant  estimation  algorithm  (that  is,  fault  tolerant  to  one  failure)  as  the 
process  of  selecting  that  value  of  x which  minimizes  L(m,x).  Unfortunately,  the  computation  of  the  minimizing  estimate  ^ is  in  general 
difficult,  and  an  analytic  expression  for  the  minimum  is  not  at  present  known. 

A 

Note  particularly  that  the  effect  of  all  values  of  i on  the  performance  index  must  be  considered  in  comparing  any  two  values  of  x^; 
if  X is  chosen  near  the  value  which  minimizes  L,  say,  then  is  likely  to  be  far  from  its  minimum  and  larger  than  L, , and  it  is  the  maxi 
mum  L|  which  must  be  taken  for  L.  For  this  reason  the  finally  realized  minimum  value  of  L (that  is,  the  value  of  the  estimation  error 
bound)  for  the  fault-tolerant  algorithm  using measurements  will  in  general  be  larger  than  when  the  optimal  non  fault  tolerant  algorithm 
(described  previously)  is  used  with,/ -1  measurements. 

THE  ZERO  DENOMINATOR  PROBLEM 

In  the  definition  of  the  performance  indices  for  both  the  fault- tolerant  and  non-fault-tolerant  cases,  we  have  ignored  the  fact  that 
the  denominator  vanishes  for  certain  valves  of  the  variable  x^.  Since  it  has  been  assumed  that  a maximum  has  been  taken  with  respect 
to  all  values  of  x,  it  is  important  that  these  cases  be  considered. 

So  far  we  have  taken  as  a measure  of  the  measurement  errors  the  length  of  the  error  vector  |e^|=  |m  - Hx|  or,  in  the  fault  tolerant 
case,  the  length  of  the  error  vector  corresponding  to  all  but  one  of  the  measurements,/|el  ’ - ej-  As  mentioned  previously,  some  other 
measure  of  measurement  error  may  be  desirable  in  certain  cases.  Let  us  generalize  the  present  discussion  somewhat  by  assuming  only 
that  the  measure  is  provided  by  some  vector  norm  ||£||,  defined  as  a non-negative  number  such  that,  for  any  vectors  x,  y and  scalar  k. 


1) 

11  x||>  0 for  X # 0 and  11x11  = 0 implies  x^=  0 

2) 

Ilk  X II  =|k  111x11 

3) 

ll><.+y.ll<l|xll  + llyll 

We  note  that  these  properties  hold  if  ||x||  is  defined  as  the  sum  of  the  absolute  values  of  the  components  of  x;  as  the  length  or  root  sum- 
square  of  the  components  of  x;  or  as  the  maximum  absolute  value  among  the  components  of  x.  The  latter  norm  will  be  used  in  a 
subsequent  example. 

The  performance  indices  defined  above  are  specializations  of  the  function 

L(m,‘5)  = sup 

X ||f_(m,x)|| 

where  f(m,x)  is  some  vector  function  which  is  continuous  in  x.  What  we  really  want  from  a definition  of  the  function  L(m,x)  is  that 
given  any  m and  x,  the  value  L is  the  least  upper  bound  such  that  for  all  x, 

lx  - x|  < L Ill(m,x)|| 

It  is  clear  that  by  defining  L as  the  least  upper  bound  we  can  ignore  the  cases  when  l(m,x)  = 0 since  then  the  right-hand  side 
is  independent  of  the  value  of  L. 

By  the  use  of  "sup"  In  the  definitions  above  we  do  not  mean  to  imply  that  the  least  upper  bound  is  finite.  In  general  this  will  not 
be  so.  If  is  a solution  of  the  equations  f(m,x)  = 0,  then  since  f^is  continuous  in  x we  can  make  Hf(m,x)||  arbitrarily  close  to  zero,  and 
the  least  upper  bound  is  infinite  for  all  x =A  x^,:  therefore  the  least  upper  bound  is  realized  when  x = x.  To  be  precise  we  should  define 

A li-  ■ 

L(id,><)  = sup  — 

X in  S ||f_(m, 


where  S is  (defined  to  be)  the  set  of  all  _x  such  that£(m,x)  ^ 0. 


For  the  non-fault-tolerant  case  defined  above  we  have  L(!!>.>0  = m - Hx.  This  function  vanishes  only  when  Hjr  * m,  which  will  occur 
whenever  the  measurement  is  without  error.  Since  by  definition  the  rank  of  H is  equal  to  the  dimension  n of  x,  there  Is  at  most  one 


solution  to  Hx  • m.  Suppose  this  solution  exists  and  call  it  x^.  Then  y = Xr.  - x gives 


L(m,x)  = L(m,x^,)  = sup 
Y*0 

|x  - 5o  + yl 
IIHyll 

- = sup 

y ^0 

lyl 

IIHyll 

which  it  clearly  finite  for  any  norm. 

” sup 
lul=  f 

lx  u| 

II X Hull 

* sup 
lul-  1 

1 

(u  a unit  vector) 

(Hull 

\st0 
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AN  INTERESTING  COMPARISON 


The  error  bound  given  above,  Lq  = 1/  /xmin,  was  obtained  essentially  on  the  basis  of  a worst  case  analysis.  Suppose  for  comparison 
that  the  measurement  errors  are  known  to  be  independent,  identically-distributed  gaussian  random  variables.  Then  it  is  known  from 
statistical  estimation  theory  that  the  least  squares  estimate  minimizes  the  RMS  estimation  error,  and  that  the  RMS  estimation  error  is 
related  to  the  RMS  error  in  the  measurements  by  the  formuia 


RMS 


'-gaussian  = 


ill'll] 


1 


RMS 


[kl 


where  Jt  is  the  number  of  measurements.  Some  typical  values  are  tabulated  below. 


Configuration 

Three  measurements 
in  one  dimension 

Orthogonal  Triad 


1 0 0 
0 1 0 
0 0 1 


Lq  = 1 / y X min 


1/ /3  = 0.58 


i 


TRACE 


(h'''h)’ 


'-gaussian 

1/3 


Dodecahedron 


1 0 0 
0 1 0 


0 0 1 


1,', 


/2  = 0.71 


1/2 


Pentad  (dodecahedron 
with  one  axis  removed) 


1 0 0 
0 2 0 
0 0 2 


I 2/5  = 0.63 


A FAULT-TOLERANT  ALGORITHM 

The  algorithm  defined  above  by  the  performance  index  L(m,^)  is  not  fault-tolerant  because  a failure  means  one  component  of£ 
will  be  large  and  thus|^|  will  be  large.  From  another  viewpoint,  the  least  squares  estimate  x weights  all  measurements  and  will  be 
contaminated  by  the  bad  measurements. 

Now  we  have  defined  a failure  as  a condition  where  an  instrument  is  contributing  more  than  a tolerable  amount  of  error.  If  we 
knew  that  such  a failure  had  occurred,  we  would  define  our  performance  index  only  on  the  measurements  from  the  remaining  unfailed 
instruments.  In  that  case,  we  would  have  used  for  the  measure  of  the  measurement  error. 


Ju 


(min  on  i = 1 j^) 


since  this  will,  in  effect,  delete  the  contribution  from  the  instrument  with  the  largest  error.  Corresponding  to  the  use  of  this  "reduced" 
configuration,  we  will  have  a "partial"  performance  index: 


Li  (m,x^|  = sup 

X 


(i=  1 /I 


Then  we  may  say,  if  we  knew  that  all  the  instruments  were  good  we  would  select  x to  minimize  L(m,  )^.  And  if  we  knew  that  the 
i*^  instrument  had  failed,  then  we  would  select  S to  minimize  Lj  (m,5)  instead. 

However,  in  general  we  will  not  know  when  an  instrument  has  failed.  At  best  we  can  attempt  to  estimate  the  errors  (knowing  that 
we  might  be  using  the  failure  in  the  estimate),  but  then  we  will  have  to  set  some  thresholds  (based  on  a priori  knowledge)  in  order  to 
decide  when  a failure  had  occurred.  Clearly,  the  most  pessimistic  (and  therefore  safest)  thing  to  do  is  simply  to  select  the  worst  perfor 

A 

mance  index  from  the  set  of  L|(m,  x),  whether  a failure  has  occurred  or  not.  Thus,  we  may  take  for  our  fault  tolerant  performance  index, 

L (m,x)  = sup  L|  (m,5) 


l^><l 

' 


(i-  1 /) 


i<-r 


To  summarize,  for  the  non-fault  tolerant  performance  index,  given  m 
1 ) There  is  exactly  one  value  x^  such  that  Hx^  = m and 

a)  L(m,  xl  ‘ 


b)  L(mj(o)  = sup  r - (fmnel 

“ luri 

2)  There  is  no  value  x such  that  Hx  = m and 

-o  ~o  — . 

. , A.  I I 

L m,  x)  = sup,,- — ^1 

It  can  be  shown  that  the  latter  bound  is  also  finite  for  any  norm  (we  omit  the  proof).  It  should  be  emphasized  that  in  case  (1)  the 
algorithm  will  choose  Xq  as  the  "best  estimate"  since  it  is  the  value  which  minimizes  L;  but  this  makes  sense  because  Hx^  = m says  that 
there  is  no  observable  inconsistency  in  the  measurement. 

The  fault-tolerant  case  is  complicated  by  the  fact  that  the  denominator  in  the  / partial  performance  indices  L,  can  vanish  under  a 
number  of  additional  conditions.  To  simplify  discussion,  let  us  introduce  matrices  P|,  each  defined  as  thei^-1  by  / matrix  which 
results  from  deleting  the  i*^  row  of  the  unit  matrix.  Then,  for  any  ^ P|X  is  the,/-1  dimensional  vector  which  results  from  deleting  the 
if*’  component  of  x. 

Using  the  P.  matrices,  the  denominators  of  the  L.  are  simply  special  cases  of  ||f  (m,  x )1|  = j f (m,  x ) | with  f (m,  x)  = P,  (m  - Hx).  That  is, 

|Pi(!!>-  HxJl  =J  |e_l^  - €?' 

Following  the  previous  discussion,  we  are  interested  in  the  cases  where  fim^)  = P|(m  - Hx)  = 0.  Clearly,  when  there  exists  an  x^ 
such  that  H^  = m,  then  the  above  results  apply  directly.  This  will  occur  not  only  when  the  measurement  m is  without  error  but  also, 
for  one  of  the  L.  when  only  the  i*^  measurement  is  in  error.  It  is  "very  nearly"  the  situation  when  one  of  the  measurement  errors  is 
much  larger  thari  the  others  (the  hard  failure  case).  There  are  also  other  situations.  For  example,  consider  the  case  of  three  measurements 
of  a scalar  x,  where  two  of  the  measurements  happen,  by  an  accidental  combination  of  errors,  to  give  the  same  value.  Then 


f"'.] 

! m,  = Hx  = X 


has  no  solution  unless  mj  = m^ . 


M = P,  X 


has  the  solution  x = m^  for  all  m^ . 


Suppose  that  Xq  is  a value  such  that  PjHxo  = P|m.  Then  Lj(m,x)  = » unless  x = x^,  and  in  this  case  we  have,  by  the  same  arguments 


We  will  assume  that  />  n (else  we  would  not  be  able  to  detect  failures  at  all),  that  the  rank  of  P^H  is  n,  and  P.Hu  0 if  u 0.  Then  the 
denominator  is  never  zero  and  Lj  is  finite. 

Note  that  when  m is  without  error,  that  an  x^  exists  (it  is  the  "true  value")  such  that  P.  (m  - Hx^)  = 0 for  each  of  the  partial 
functions  Lj,  and  thus  each  will  realize  its  (finite)  least  upper  bound  L,  (m,  Xq)  for  the  same  value  x = x^.  On  the  other  hand,  given  some 
other  m there  may  be  a solution  x,  such  that  Pj(m  - Hx,)  = 0,  and  there  is  no  reason  to  suppose  that  such  a solution  would  necesarily 
satisfy  P.(m-  Hx, ) = 0 if  j i.  But  if,  for  this  value  of  m,  there  were  a value  x,  such  that  P.  (m-  Hx,)  = 0,  then  since  LTm,  x) 
would  realize  its  (onlyjfinite  value  for  x = x,,  we  could  say  for  sure  that  Lj(m,  x, ) = ■».  And  since  L(m,  x)  is  defined  as  the  maximum  over 
all  the  L we  would  have  L(m,  x)  = <”  for  all  values  of  x.  For  this  value  of  m we  would  be  unable  to  form  a (finite)  estimate  of  the  measured 
value. 

Clearly,  if  we  are  to  have  a system  which  is  well  defined,  then  we  must  require  that  there  be  no^two  values  x,  * x,  such  that  for  any 
m and  any  i ^ j 

PiHx,  = PiOn 
and  PjHx,  « Pjm 
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Suppose  such  a case  existed.  Let  P..  be  the  H -2by  H matrix  which  results  from  deleting  the  and  rows  of  the  unit  matrix 
(i  # j).  Then  PjHXj  = Pjin  and  PjHXj  = P.m  imply  that 

P..Hx  = P .m,  P..Hx,  = P .m, 

II  -I  U-  i|  I]- 

and  therefore  P..H(x,  -x.)  = 0.  But  this  implies  that  P..H  has  rank  less  than  n,  the  rank  of  H;  and  since  P-.H  consists  of  / - 2 rows  from 
■I  i|  rt  i| 

H,  we  can  avoid  the  possibility  of  this  result  simply  by  taking  X -2  >n.  This  extremely  important  result  is  not  completely  unexpected 

since  it  takes  at  least  n + 2 measurements  to  identify  a failed  measurement,  even  if  the  magnitude  of  the  unfailed  instrument  errors  is  known. 
THE  ALGORITHM  IN  PARITY  SPACE 

The  fault-tolerant  performance  index  LIm,  x ) will  first  be  written  in  terms  of  the  q-dimensional  parity  vector  p = Vm  and  the 
n-dimensional  base  vector  x^^  = Km.  No  special  assumptions  are  made  about  K except  that  KH  = I (that  is,  x^^  is  not  necessarily  the  least 
squares  estimator).  We  note  that  by  the  considerations  of  the  previous  section,  we  should  assume  that  q = - n >2.  This  makes  sense 

physically,  because  q = 1 would  mean  that  all  measurement  axes  in  parity  space  are  parallel  and  it  would  therefore  be  impossible  to  dis- 
tinguish between  different  failures. 

For  convenience  let  y = x - X|^.  y = x - Xj^. 


L(m,  x)  = sup 


'j(m-Hx^-H^)l| 


y.i  ri"^B-Hy)l| 

1 A 1 A 

In  the  case  where  | p | # 0,  let  p = pu  where  | u | = 1 and  p > 0,  and  define  z V > ^ “ Y 


L(m,  x)  = sup 


[|pP.(Mu-Hj)| 


]|P,(Mu-Hz|| 


= L(Mu,  it) 


In  the  case  where  p = 0,  we  will  have  ||P|Hy||  in  the  denominator,  so  we  will  have  to  take  y = 0,  or^  = Xj^. 

Then  , , , , 

By  these  transformations  we  see  that  the  performance  index  can  be  determined  entirely  from  the  q-dimensional  unit  vector  u,  rather 
than  the  /-dimensional  measurement  vector  m.  This  provides,  in  the  fault-tolerant  case,  a substantial  simplification. 


For  notational  convenience  let  us  define 


x(u)  = min  L(Mu,z) 


The  set  of  z which  realize  the  minimum  constitute  a relation  z = ^(u).  Then  we  can  formulate  the  fault-tolerant  algorithm  for  the  safest 
estimate  x as  follows: 

1 ) G iven  m construct  p = Vm  and  Xj^  = Km 

2)  If  p = 0 then  x = x.  , and  the  maximum  error  in  the  estimate  is  L . 

- - -Q  O 

3)  Else  construct  u = p,  set  x = x. -t  (pi  J(u|,  and  the  maximum  error  in  the  estimate  is  x(u). 

|p|  -b  i-i 

Of  course,  in  the  general  case,  the  computation  involved  in  the  above  is  quite  complex.  However,  in  various  practical  situations  it 
can  be  greatly  simplified.  For  example,  if  q = 2,  which  Is  the  case  with  five  instruments  m 3-dimensional  space,  the  parity  vector  is 
2-dimen$ional.  Thus  we  can  take  [cos9l 


The  functions  x (y  I and  ^(u)  can  then  be  expressed  m terms  of  the  single  variable  0,  and  a practical  algorithm  can  be  obtained  by 
expanding  the  components  of  $ in  a Fourier  series  in  0. 
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FURTHER  EXAMPLES 

Consider  again  the  case  of  three  measurements  of  a scalar  quantity  x 


mj  = X + Cj 
mj  = X + 

Let  the  measure  of  measurement  error  be  the  root-sum-square  error 

llcfl  = lei 

then 


A 

Lj  (x)  = sup 

X 


- x)2  + (rn^  - x)^  + (m^  - x)*  - (rrij  - x)^ 


Because  of  the  symmetry  of  the  problem.  It  is  only  necessary  to  evaluate  one  of  the  L.  functions.  Thus 


^ V + (m,  -x)* 


For  convenience,  make  the  change  of  variables 


A A 
y = X - X 


x^  = (m,  + m,)/2 
a ' * 


With  these  variables,  the  denominator  in  becomes 


(m^-x)^  + (m^-x)*  = im,  -y  -x  + (^3  -y 


1 m,  -mj 

2 

4- 

mj  - m, 

1 2 

^1 

2 

“ y 

/ 

= 2 (y’  + d] 


‘*12  ' (m,-mj)/2 


L,(x)  = sup 
V 


I2(y"  + df,  I 


To  maximize  (x),  note  that  as  | y | — ♦«>,  the  quantity  in  braces  approaches  ( 1 /-y/2). 
Setting  the  derivative  of  the  quantity  in  braces  to  zero  yields 

'*!, 

V 7T- 

V 

and  substituting  this  value  of  y gives  for  the  expression  in  braces  a value  of 


ft 


g 
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which  is  > (1/v/2). 


Therefore 


V 2d!, 


L,(x)  = y/Vi(1  +a,’) 


iy|  |x-’/5(m,  -i-m,)| 


and  by  symmetry 


L (x)-  7%(1  +aM 


I X - {m.  + m^) 

jm.  - m,  j 
I k 


and  j and  k are  the  other  two  values  of  the  measurement  index. 


Since  L !x)  is  a monotonic  function  of  a.,  it  follows  that 


A A 

L(x)  = max  L.(x) 
Ki<3  ' 


I 'A  (1  +0M 


A A 

(3  (x)  = max  a.  (x) 
Ki<3  ' 


Finally,  since  the  equations  are  symmetrical  in  the  measurements,  assume  that 


From  the  equation  fora  ($),  it  follows  that 


Ojfmi)  = aj(mj)  = 1 


0<a,(x)«J1 


m,<x<mj 


and  otherwise 


a,(x)>1 


Similar  inequalities  hold  for  a^(x)  and  a,(x). 
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Thus,  in  the  interval  m I <x<m,,aj  (x)  satisfies  a,  (x)<  1,  while  either  a,  (x)  or  a,  (x)>1. 
Thus,  in  this  interval. 


A 

0(x)  = max 


|a,  (x),aj(x)j 


and  0{x)  is  given  by 


A , 


]Oj(x)  if  m^  < X < t 


mj  + m^  - 2x 


A / 

0M=( 


2x  -rrii  -m^ 


if  m,  < X <m. 


if  m^  < X < mj 


2(m  -x) 


0(x)  = 1 + < 


A 

2(x  -m. 


if  m,  <x 


if  m,  < X < r 


and  over  the  interval 


m,  < X < m, 


the  minimum  value  of  0 is  1,  and  the  minimizing  value  of 'x  is  m 


and  taking  the  maximum  over  i. 


and,  for  x < m, , 


a|(x)>o.(mj)  (fori  = 1,2,  3) 


2(mj-mj) 

= ^ + >1 

’ (mj-m,) 


a.(x)>a.(m^)  (fori  = 1,2, 3) 


and 


A 2(mj-m,) 

0M>0(m, ) = 1 + > 1 


Therefore,  1 is  the  minimum  value  of  0 and  mj  is  the  minimizing  value  of  x.  Inserting  this  value  of  0 into  the  equation 
L(x)  = J'/id  +(J^),the  minimum  value  of  the  performance  index  is  also  1. 

Therefore,  when  the  measure  of  measurement  error  is  the  root -sum-square  error,  the  resulting  estimate  is  the  mid-value  of  the 
measurements,  and  the  estimation  error  is  always  less  than  or  equal  to  the  root -sum-square  of  the  two  smallest  measurement  errors. 


r 
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Note  that  when 


A 

X = m. 


the  components  of  the  performance  index  have  the  values 


L,  M = 1 


L,W<1 


A 

Ljlx)  = 1 


and  the  estimator  is  trading-off  the  penalty  due  to  a failure  in  mj  against  the  penalty  due  to  a failure  in  m^. 


As  a second  example,  again  consider  the  case  of  three  measurements  of  a scalar  quantity,  but  let  the  measure  of  measurement  error 
be  the  maximum  measurement  error 


llEll  =max  |e.| 
Ki<3 


Again,  by  symmetry,  it  is  only  ne.  essary  to  evaluate  one  of  the  components  of  the  performance  index.  Employing  the  notation  of  the 
preceding  example. 


A 

Lj (x)  = sup 


l^-x| 


^ ) max  ()m|'-x), 


- sup 


l^-yl 


V I max(|y-djJ,  lv+d,J  ) 


* sup 


ry-y| 


y (.  Ivl  + k,| 


= sup 


Ivl  + lyl 


Now 


Ivl  +lvl 


= 1 + 


;i-k 


L 1 .1,1 


and  if 


y ^ |d, 


L,  (x)  is  maximized  by  letting  |y|-»- yielding 


A 

L,(x)  = 1 


On  the  other  hand,  if 


Ivl  > 


L (x)  is  maximized  by  faking  |y  = 0,  and 


L,(x)  ■ 


or,  in  either  case 


L (x)  • max 


1^1 
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Thus,  by  symmetry 


and  maximizing  over  i. 


L.(x)  = max 


L(x)  = max 


r AT 
:,.a,,x,] 

[l.mx)] 


It  was  shown  in  the  preceding  example  that 


PM  > 1 


unless 


A 

X = m 


and  that 


P(mj)  = 1 

Again  the  resulting  estimate  is  the  mid  value  of  the  measurements  and  the  minimum  value  of  the  performance  index  is  1.  In  this 
case,  the  unit  minimum  value  of  the  performance  index  implies  that  the  estimation  error  is  always  less  than  or  equal  to  the  second 
largest  measurement  error.  At  the  minimizing  value  of  x,  the  components  of  the  performance  index  have  the  values 

AAA 
L,(x)=Lj(x)  = L,(x)  = 1 

SOME  NUMERICAL  RESULTS  ON  THE  PENTAD 

The  pentad  defined  in  the  examples  above  is  particularly  interesting  because  it  is  a practical  configuration  of  the  minimum  number 
of  instruments  needed  to  implement  the  algorithmic  covering  technique  (that  is,  total  failure  detection  by  output  comparison!  in  3- 
dimensional  space.  In  this  case  the  parity  space  is  2-dimensional,  and  a number  of  symmetries  exist  which  can  be  used  to  further  simplify 
the  computational  problem.  When  the  parity  vectors  corresponding  to  the  individual  measurement  axes  are  plotted,  we  find  that  there 
is  an  axis  every  36  degrees. 


Consider  the  special  case  where  the  measure  of  measurement  error  is  taken  as  the  maximum  absolute  value  among  the  component 


measurement  errors: 
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Because  of  the  symmetries  in  this  problem,  it  is  possible  to  show  (we  omit  the  proof)  that  if  p is  expressed  as 


P = 


cos  6 
sin  0 


(O<0  < 360°) 


then  it  is  only  necessary  to  examine  the  values  of  0 such  that  0 < 0 < 18°.  It  is  fairly  straightforward  to  organize  the  computation  of 
L(m,  x)  into  choosing  alternatives  from  a finite  set  of  computable  simple  functions,  and  then  to  find  an  approximation  to  Lmin  by  a 
minimum  seeking  computational  process.  Some  approximate  typical  results  are 


L . 


min 


0° 

6° 

12° 

18° 


5.33  (zero  denominator  case) 

5.33 

5.33 

5.33 


It  appears  that  is  nearly  constant  (the  above  results  were  computed  to  a number  of  additional  decimal  places).  A speculation 
is  that  it  may  be  independent  of  0 for  this  case.*  Thus,  the  maximum  RSS  estimation  error  is  always  less  than  about  5.33  times  the 
second  largest  measurement  error  for  this  instrument  array. 

This  number  may  be  compared  with  the  value  4.80  which  is  the  ratio  of  the  maximum  RSS  estimation  error  for  the  bounding 
sphere  estimate  to  the  maximum  error  assumed  for  an  unfailed  instrument  in  the  presence  of  a single  failure.  The  bounding  sphere 
estimate  is  optimal  in  the  sense  that  it  minimizes  the  ratio  of  the  maximum  RSS  estimation  error  to  the  maximum  error  for  unfailed 
instruments  when  a hard  upper  bound  on  errors  of  unfailed  instruments  is  known.  If  there  is  a hard  upper  bound  on  unfailed  instrument 
errors,  the  present  estimate  will  have  an  estimation  error  bounded  by  5.33  times  the  unfailed  instrument  error  bound  in  the  case  of  a 
single  failure,  since  the  second  largest  instrument  error  will  be  less  than  the  unfailed  instrument  error  bound.  Thus,  we  are  paying  an 
estimated  error  penalty  of  5.33  - 4.80  = 0.53  times  the  maximum  unfailed  instrument  error  because  we  are  assuming  that  this  error  bound 
is  unknown.  Note  that  the  4.80  bound  for  the  bounding  sphere  estimate  was  obtained  by  a Monte-Carlo  simulation  of  measurement  errors. 
Since  these  may  not  have  exactly  sampled  the  worst  case  set  of  measurement  errors,  the  true  bound  for  the  bounding  spheres  estimate 
may  be  slightly  higher  than  the  4.80  figure. 

The  components  of  the  vector  0(0)  defined  in  the  previoui  section,  are  corrections  to  the  estimate  to  account  for  failure  tolerance. 

For  the  angles  0°,  6°,  12°  and  18°,  they  were  found  to  be  linear  (unctions  of  0 to  an  accuracy  of  a few  percent.  Thus  it  appears  possible 
to  implement  the  fault  tolerant  algorithm  for  the  symmetrical  pentad  with  very  little  computational  complexity. 

SUMMARY 

The  above  discussion  covers  many  points  which  are  important  in  the  design  and  implementation  of  redundant  instrument  systems, 
and  provides  a basis  for  further  work  on  practical  thresholdless  fault  tolerance.  Two  important  results  are: 

(1)  A worst-case  performance  index  is  developed  which  requires  no  knowledge  about  the  statistics  of  the  instrument  errors.  It  is 
shown  that  the  least  squares  estimator  minimizes  this  index,  but  that  the  resulting  algorithm  is  not  fault-tolerant. 

(2)  A fault-tolerant  algorithm  is  then  defined  which  uses  the  worst-case  performance  index  to  bound  the  estimation  error  by  the 
product  of  the  index  and  the  RSS  of  the  individual  instrument  errors  with  the  largest  error  ignored:  the  algorithm  chooses  as  the  estimate 
the  value  which  minimizes  the  worst  case  index. 

The  important  point  to  emphasize  is  that  fault-tolerance  is  achieved  without  the  need  to  determine  any  thresholds,  and  without 
having  to  identify  when  or  in  which  instrument  the  failure  occurs.  Numerical  results  indicate  that  useful  error  bounds  are  realized. 


Since  preparation  of  this  paper,  an  exact  formula  for  this  bound  has  been  found. 
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SUMMARY 

Today,  avionics  are  demanding  an  increasing  proportion  of  the  resources  available 
for  aircraft  weapons  systems.  These  avionics  are  providing  Increased  capability  and 
accuracy  to  the  aircraft  weapon  system;  but,  also  are  a prime  contributor  to  increased 
complexity  and  decreased  reliability  of  the  system.  Digital  avionics  appears  to  offer 
the  desired  increase  in  capabilities  and  performance  without  the  normal  companions  of 
low  reliability,  complexity,  and  high  cost  because  it  is  amenable  to  mechanization  via 
solid  state  devices;  it  is  more  orderly  and  systematic,  and  provides  growth  and  change 
without  major  hardware  modification.  Digital  avionic  integration,  in  order  to  reap 
these  benefits,  requires  standard  equipment  Interfaces  and  a standard  approach  to  data 
intercommunication.  The  time  division  data  bus  is  the  technique  that  permits  this  new 
concept  of  system  Integration  to  emerge.  This  paper  will  present  the  data  bus  evolutions, 
its  standardization  and  application.  The  acquisition  management  and  logistic  benefits 
will  be  discussed. 


DEFINITIONS 

1.  Time  Division  Multiplexed  Data  Bus:  Throughout  this  paper,  there  are  many 

shortened  versions  of  it:  multiplex  data  bus,  data  bus,  bus,  multiplexing,  MUX  and 

MTL-STD-1553 . 

2.  Time  Division  Multiplexing  (TDM):  The  transmission  of  information  from  several 

signal  channels  through  one  communication  system  with  different  channel  samples  staggered 
in  time  to  form  a composite  pulse  train. 

3.  Remote  Terminal;  This  is  the  electronics  necessary  to  Interface  the  bus  with 
the  sub-system  and  the  sub-system  with  the  bus. 

Bus  Controller : The  controller  shall  be  a unit  that  is  either  programmable, 

or  controlled  by  a processor,  and  that  serves  the  function  of  commanding,  scanning  and 
monitoring  bus  traffic. 

5.  Message : A message  is  a transmission  of  words  on  the  data  bus  cable.  A message 

transfer  is  complete  when  the  command  word,  data  word(s)  and  the  status  word  have  been 
transmitted.  There  are  three  types  of  messages:  controller  to  terminal,  terminal  to 

controller  and  terminal  to  terminal. 

6.  Words  in  a Message:  In  this  document  a word  is  a sequence  of  16  bits  plus  sync 

and  parity.  There  are  three  types  of  words:  Command,  Status  and  Data. 

INTRODUCTION 

Current  US  Air  Force  avionics  acquisition  practices  breed  "black  box"  proliferation 
resulting  in  high  cost,  low  reliability,  and  a heavy  operating  and  maintenance  burden. 

1 , 2 

A study  was  conducted  to  investigate  the  applicability  of  digital  techniques  to 
solve  today's  high  cost  and  proliferation  of  aircraft  avionics.  It  was  found  that  in 
current  aircraft,  the  avionics  cost  is  approximately  one-third  of  the  total  system  ac- 
quisition coat.  Because  of  the  lack  of  a "standard"  integration  approach,  equipment 
Introduced  into  inventory  Is  generally  in  low  quantity  and  of  low  reliability  and, 
therefore,  results  in  a logistic  and  maintenance  nightmare.  Even  equipment  Intentionally 
starting  out  identical,  ends  up  unique  because  of  the  different  Interface  requirements 
in  different  systems,  causing  them  to  become  incompatible. 

One  solution  proposed  by  the  study  group  was  to  view  the  aircraft  avionics  as  an 
"Integrated  system"  rather  than  a conglomerate  of  functional  sensors.  The  data  bus  con- 
cept forces  one  to  perform  systems  level  analysis  because  Information  flow  definition  is 
used  as  the  key  element  in  the  orderly,  standard  integration  process  of  avionic  equipment 
(black  boxes)  . 

Digital  avionics  will  be  used  more  and  more  in  current  and  future  aircraft  systems 
and  the  data  hus  will  be  the  key  to  successful  integration.  Other  requirements  are 
sensors  built  to  the  bus'  standard  digital  interface,  the  liberal  use  of  common  digital 
computers,  modu 1 a r /mu  1 1 1 pu r pos e controls  and  displays  and  a standard  higher  order  soft- 
ware language.  When  utilizing  the  data  bus  concept,  it  is  important  to  realize  that  a 
large  amount  of  the  integration  is  accomplished  through  software.  Therefore,  the  final 
and  most  critical  Integration  step  is  performed  through  the  effective  application  of 
flight  software;  l.e.,  the  software  controls  the  real-time  Information  flow. 
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THE  DATA  BUS 

The  digital  time-division  multiplex  data  bus  is  a tool  to  aid  in  system  integration. 
Originally  introduced  to  save  avionic  hardware  interconnect  wiring  weight  and  ease  com- 
puter Interface  requirements  * it  is  now  considered  the  cornerstone  of  digital  avionics. 
Through  the  use  of  the  multiplexing  technique,  many  other  benefits  can  be  reaped.  The 
resultant  standard  electronic  interface  permits  a building  block  approach  to  systems 
architecting  and  integration.  Retrofits  become  easier,  standard  interchangeable  sensors 
become  a reality,  and  maintenance  is  greatly  Improved.  Flexibility  is  probably  the  data 
bus*  greatest  attribute,  reliability,  built-in-test,  redundancy  and  graceful  degradation 
are  additional  benefits.  These  gains  are  realizable  only  if  some  form  of  standardization 
is  adhered  to;  therefore,  a mllit^r^  standard  was  developed  to  provide  some  consistency 
in  data  bus  design,  MTL-STD- 1 5 5 3A  ’ . This  standard  has  been  used  in  a number  of  Air 

Force  and  Navy  programs.  Many  new  applications  are  developing.  DAIS^  is  a laboratory 
program  to  help  define  and  explore  applications  of  digital  avionics.  The  Air  Force  is 
committed  to  the  application  of  digital  avionic  techniques  and  the  multiplexed  data  bus 
concept  is  an  Important  part  of  it. 

BACKGROUND 

Typical  signal  Integration  is  difficult  because  sensor /equipment  interfaces  are 
unique  (non-standard  signal  characteristics)  resulting  in  a significant  weight  and 
volume  penalty  in  large,  avionic  dispersed  aircraft  systems.  Each  sub-system  has  its 
own  unique  input/output  signalling  formats,  from  non-standard  analog,  synchro,  discrete, 
digital  serial  and  parallel,  to  infinite  combination  of  these. 

The  majority  of  avionic  sensors  are  designed  to  provide  data  in  an  analog  form 
most  convenient  to  the  sensor  manufacturer  or  peculiar  to  a one-time  application. 
Therefore,  the  computer  contractor,  to  properly  communicate  with  th®  sub-systems,  must 
build  a special  purpose  converter  unit  to  Interface  the  computer  with  these  sub-systems. 
Each  analog  or  discrete  signal  is  routed  separately  from  each  sub-system  to  this  central 
converter.  The  converter  unit  provides  such  things  as  signal  terminations,  conditioning, 
sampling,  scaling  and  conversion  to  the  proper  digital  format.  The  analog  to  digital 
and  digital  to  analog  converters  must.be  extremely  high  speed  because  they  are  time 
shared  among  all  the  Input/output  signals.  Often  this  converter  unit  Is  twice  as  com- 
plex as  the  computer  and  highly  special  purpose.  That  is,  if  any  signal  format  is 
changed  or  a new  one  added,  there  is  a definite  Impact  on  the  converter  hardware. 

Changes  are  often  costly  and  major  avionic  modifications  impossible  without  starting 
from  scratch. 

A digital  avionic  multiplex  data  bus  has  a number  of  definite  advantages.  Sub- 
stantial wire  savings  can  be  realized  by  using  the  same  data  bus  in  a time  sharing 
mode.  Wire  and  connector  savings  show  up  as  weight  savings.  Ease  of  changing  sensors 
due  to  standard  Interfaces  is  another  extremely  desirable  feature.  The  digital  trans- 
mission technique  used  Is  less  susceptible  to  EMI  and  it  is  easier  to  detect  and  correct 
errors.  System  configuration  modifications  can  be  performed  by  properly  changing  the 
computer  software.  Computers  will  no  longer  become  obsolete  because  they  will  be  truly 
general  purpose  (l.e.,  special  purpose  converter  hardware  will  no  longer  be  an  Integral 
part  of  it).  In  this  digital  bus  concept,  sometimes  called  MUX  for  short,  the  sensors 
that  provide  real  time  data  to  the  computer  or  receive  data  from  it  are  all  tied  in 
parallel  to  a common  multiplexed  data  bus.  Transmission  is  digital  under  central  com- 
puter control.  This  requires  federated  conversion  and  federated  data  storage  in  a 
sensor  scratch  pad  memory.  The  sensor  generates  the  data,  converts  it  and  stores  it  in 
Its  own  scratch  pad  (remote  data  memory)  at  its  own  Iteration  rate.  The  central  com- 
puter, under  software  control,  samples  these  data  memories  as  required  by  the  operation- 
al program  and  treats  them  as  if  they  were  an  extension  of  the  central  computer’s  main 
memory.  Data  gathering  is  accomplished  through  the  use  of  the  computer  programmed 
Input/output  controller. 

An  interesting  advantage  to  this  technique  is  that  the  data  transfer  rates  on  the 
bus  are  much  lower,  requiring  a much  lower  bandwidth  transmission  line.  A one  mega- 
hertz data  transmission  rate  is  more  than  sufficient.  A data  word  is  transferred  only 
when  needed  rather  than  continuously  as  in  the  analog.  (Continuous  transmission  is 
required  in  the  analog  transmission  technique  because  the  sensor  and  converter  hardware 
never  know  when  and  even  if  the  software  requires  the  newly  generated  information.) 
Consequently,  the  computer  memory  must  always  be  updated  with  the  most  current  data. 

In  the  multiplex  data  bus  technique,  the  sensor  scratch  pad  memories  are  treated  as 
an  extension  of  the  computer's  main  memory  and  their  use  is  under  program  control. 
Therefore,  additions  and  deletions  of  sensors  are  easily  accomplished.  Sensor  modifi- 
cations that  change  the  sensor's  analog  Input/output  format  will  require  the  sensor 
contractor  to  modify  his  converter  and  It  will  be  totally  his  responsibility  to  comply. 
These  modifications  will  then  be  performed  by  the  contractor  in  the  sensors  own  lower 
speed  converter  unit  requiring  only  software  modifications  within  the  central  computer. 

SYSTEM  APPLICATIONS 

The  first  systems  application  of  this  type  of  a multiplexed  data  bus  was  in  the 
integration  of  the  USAF's  F-15  fighter  aircraft.  (Figure  1).  Some  of  the  reasons  why 
the  data  bus  was  originally  used  were: 


1.  Weight  savings  (wire,  connectors) 
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2.  Simpler  wire  routing  in  aircraft  (fewer  bulkhead  holes,  clamps,  connectors) 

3.  Fewer  data  transfers  (sensor  information  transmitted  on  demand  only) 

4.  Ease  of  changing  sensors  (made  possible  because  of  the  resulting  standard 
interface) 

5.  Less  tendency  to  obsolescence  (sensors,  computers) 

6.  Less  electromagnetic  interference  (because  of  digital  transmission) 

7.  Higher  reliability  (better  error  detection,  fewer  hardwire  Interconnections) 

8.  Permits  retrofits  (eases  modification  and  growth  problems) 

9.  Lends  Itself  to  simplified  built-in-test  (BIT)  techniques 

Once  proven  feasible,  another  major  USAF  system,  the  B-1  strategic  bomber,  designed 
its  avionic  system  around  the  data  bus.  (Figure  2).  Three  separate  bus  systems  are 
utilized  in  the  B-1  avionic  systems. 

The  first  one,  quad  redundant,  is  used  for  electrical  power  switching  control  and 
management  (EMUX) ; the  second,  dual  redundant,  for  mission  avionic  sensor  integration, 
both  offensive  and  defensive  (AMUX) , and  the  third,  no  redundancy,  for  bu 1 1 t- in- t e s t 
purposes  called  "Centrally  Integrated  Test  System"  (CITS) . 

At  this  point  in  time,  an  undesirable  trend  was  perceived  in  the  development  of 
multiplex  data  bus  (MUX)  techniques,  namely  that  MUX  systems  were  beginning  to  pro- 
liferate to  an  extent  which  was  threatening  to  obviate  the  gains  which  they  had  pro- 
vided as  noted  above.  A number  of  different  MUX  buses  had  been  built,  all  of  which 
possessed  similar  architectures,  data  rates,  encoding  techniques,  and  technology. 
However,  there  was  sufficient  difference  in  their  signal  formats  and  protocol,  inter- 
faces, and  functional  operation  to  preclude  any  common  hardware  or  Interfaces.  Further, 
it  was  apparent  that  the  use  of  MUX  techniques  was  spreading  into  other  non-traditlonal 
avionic  areas  such  as  engine  management,  stores  management,  flight  Instrumentation  and 
flight  controls.  In  summary,  the  time  was  ripe  for  standardization  of  MUX  systems. 

Drawing  on  the  F-15  and  B-1  experience,  and  reviewing  anticipated  future  avionics 
needs,  a draft  standard  was  prepared  that  resulted  in  MlL-STD-1553  (USAF),  30  Aug  1973. 
Resultant  trl-servlce  negotiations,  based  on  US  Navy  and  Army  application  requirements, 
caused  the  standard  to  be  ungraded  and  it  was  published  as  a tri-service  document,  MIL- 
STD-1553A,  on  30  April  1975^. 

THE  MILITARY  BUS  STANDARD 

MIL-STD-1553A,  entitled  "Aircraft  Internal  Command /Response  Time  Division  Multi- 
plex Data  Bus",  covers  the  overall  systems  requirements,  architecture,  operational 
protocol,  and  Interfaces  within  the  multiplexed  data  bus  system.  It  is  Intended  to 
establish  uniform  requirements  for  all  multiplex  applications,  not  just  avionics,  and 
provides  for  commonality  of  electronic  functions  and  Interfaces  within  the  total  air 
vehicle.  The  standard  was  written  so  as  to  permit  the  system  designer  a maximum  amount 
of  design  flexibility  while  retaining  a common  Inter-weapon  system  interface.  As  a re- 
sult, MIL-STD-1553A  must  be  employed  in  conjunction  with  detailed  air  vehicle  or  sub- 
system s pec  1 f 1 ca 1 1 ons . 

Figure  3 Illustrates  an  elemental  bus  configuration  using  MIL-STD- 1 5 5 3A . The  bus 
system  has  three  major  components,  the  bus  controller,  the  data  bus  itself,  and  remote 
terminals  or  subsystems.  The  bus  controller  acts  as  the  focal  point  for  the  data  bus 
and  the  integration  of  the  subsystems  connected  to  the  bus.  The  bus  controller  Is  a 
software  programmable  device,  a computer  or  dedicated  microprocessor,  which  Issues  the 
commands  to  Initiate  data  transfers  over  the  data  bus.  No  remote  terminal  transmits  a 
signal  without  the  transfer  being  Initiated  by  the  controller;  Its  commands,  and  the 
remote  terminals  respond,  hence  the  description  "command/response"  architecture.  When 
growti  Is  required  In  the  system,  the  new  subsystem  is  added  and  given  a unique  address. 
The  only  change  required  to  the  existing  equipment  Is  a modification  to  bus  controller 
software,  thus  providing  a high  level  of  flexibility  without  major  equipment  modifica- 
tions. 

The  data  bus  Itself  is  a shielded  twisted  wire  pair,  operating  in  a classical 
balanced  transmission  line  mode.  Redundancy  (l.e.,  multiple  data  buses)  Is  the  system 
designers  option.  (Figure  4).  Transformer  coupling  Is  used  between  the  bus  and  the 
terminals;  the  signal  on  the  bus  when  transmitting  dati  is  a serial  bit  stream, 
Manchester  Bl-Phase  Level  encoded,  and  transmitted  at  a 1.0  MHz  rate. 

The  remote  terminals  provide  that  hardware  necessary  to  interface  the  subsystem 
to  the  bus;  this  Includes  t r ansm 1 1 1 er /r ec e 1 ve r , encoder /decoder  , error  checking,  and 
holding  registers.  The  remote  terminal  may  exist  as  a separate  line  replaceable  unit 
(LRU),  or  be  built  Into  a subsystem  which  Interfaces  to  the  data  bus  (see  Figure  3). 

In  general,  any  moderately  complex  subsystem  (e.g.,  an  Inertial  subsystem)  will  Inter- 
face directly  to  the  bus,  whereas  the  relatively  simple  or  low  data  rate  subsystems 
(e.g.,  a TACAN  or  doppler  altimeter)  would  Interface  to  the  bus  through  a remote  ter- 
minal which  can  handle  more  than  one  of  these. 
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Three  types  of  words  are  used  to  make  up  the  messages  which  are  transmitted 
between  subsystems  on  the  data  bus;  Command,  Data  and  Status  words  as  shown  in 
Figure  5.  Command  words  are  only  transmitted  by  the  bus  controller.  Status  words 
only  by  remote  terminals,  and  Data  words  by  each.  All  words  are  preceded  by  a unique 
synchronization  waveform  which  Is  distinguishable  from  the  remainder  of  the  Manchester 
encoded  bits.  The  Command  word  is  divided  into  four  fields;  the  first  being  the  address 
of  the  terminal  to  which  the  Command  word  is  directed;  second,  the  t ra nsm 1 t / rece 1 ve  bit 
which  indicates  the  remote  terminals  action;  third,  a subaddress/mode  field  to  be  de- 
fined by  the  remote  terminal  designer;  and  fourth,  a count  of  the  number  of  words  to 
be  transferred.  The  last  bit  in  the  word  Is  used  to  provide  odd  parity  over  the  pre- 
ceding sixteen  bits. 

The  Data  word,  which  contains  the  information  to  be  transferred,  has  one  sixteen 
bit  field,  followed  by  parity. 

The  Status  word,  which  Is  always  transmitted  by  the  terminal  In  response  to  a 
command  Is  also  partitioned  Into  four  fields.  First  Is  the  terminal  address;  second, 
a message  error  bit  to  Indicate  the  failure  of  the  preceding  message  (e.g.,  a parity 
error  in  one  of  the  preceding  words  In  the  message);  third,  nine  bits  which  can  be  used 
for  defining  status  codes;  and  fourth  the  terminal  flag  bit  which  Indicates  the  need 
for  the  bus  controller  to  request  status  and  self-test  Information  from  the  terminal. 
Parity  is  also  provided  for  the  Status  word. 

The  Command,  Data,  and  Status  words  are  combined  to  form  three  message  formats  as 
shown  In  Figure  6,  A bus  controller  to  remote  terminal  transfer  is  accomplished  by  the 
controller  sending  a receive  Command  word  followed  contiguously  by  the  Data  words.  After 
a specified  gap  time,  the  addressed  terminal  responds  with  its  Status  word,  thus  indi- 
cating proper  receipt  of  the  message. 

A terminal  to  controller  transfer  is  initiated  by  the  controller  sending  a transmit 
Command  word  to  the  remote  terminal.  After  the  gap  time,  the  addressed  terminal  responds 
with  a Status  word  contiguously  followed  by  the  specified  number  of  Data  words. 

The  direct  termlnal-to-termlnal  transfer  may  be  effected  by  a combination  of  the 
two  previous  commands.  It  Is  Important  to  note  that  continuous  '‘handshaking"  between 
remote  terminals  and  the  bus  controller  provides  a constant  system  health  monitoring 
function,  thus  providing  an  Inherent  b u 1 1 t- In- t es t (BIT)  capability  of  the  MUX  bus. 

The  system  designer  using  MIL-STD- 1 5 5 3A  still  retains  a high  degree  of  flexibility 
in  his  design  approach  due  to  the  nature  of  the  standard.  His  options  Include  selecting 
the  redundancy  scheme  (dual,  triple,  or  quad?  - Figure  4)  determining  the  bus  controller 
implementation,  identifying  crltla  for  selection  of  a signal  for  transmission  on  the  bus, 
defining  the  physical  configuration  of  the  bus  system,  and  providing  functional  operation 
algorithms.  Due  to  its  flexibility,  MIL- STD- 1 5 5 3A  has  received  wide  acceptance  by  in- 
dustry and  DoD,  and  has  already  been  applied  to  a large  number  of  military  weapon  systems. 

APPLICATION  BENEFITS  OF  THE  DATA  BUS 

MIL-STD- 1 5 5 3A  is  being  applied  to  a variety  of  current  weapon  systems,  both  as  an 
integration  tool  to  reduce  aircraft  wiring  and  to  increase  reliability  and  flexibility, 
and  also,  as  a standard  digital  interface  to  subsystems.  The  following  systems  are 
using,  or  contemplating  using  MIL-STD- 1 553A;  F-16  (Figure  7),  F-18,  DAIS,  LAMPS,  RPV, 

AMST  and  Army  helicopters,  ATF.  NATO  ACARD”  is  investigating  commonality  between  inter- 
national multiplex  data  buses. 

If  multiplexing  is  used  as  an  integration  technique  in  weapon  systems,  great  stan- 
dardization benefits  can  be  realized.  Multiplexing  defines  the  way  information  flows 
on  the  data  bus  and  also  the  protocol  needed  to  accomplish  orderly  data  transfers.  This 
enables  one  to  define  two  standard  interfaces;  one  to  the  avionic  subsystems  that  have 
d 1 g 1 1 a 1 / d 1 sc r e t e 1 npu t s / ou t pu t s and  the  other  directly  to  the  data  bus  (twisted  pair) 
for  subsystems  that  have  the  multiplexed  remote  terminals  built  in  (Figure  3).  All  this 
is  defined  in  M I L- STD- 1 5 5 3 A . 

Other  standardization  tasks  are  now  underway  in  the  area  of  airborne  computers  and 
its  application  and  support  software,  multipurpose  controls  and  displays  and  sensors 
with  standard  Interfaces.  Because  digital  techniques  are  expanding  Into  areas  that  were 
formerly  pur^'ly  electrical  and/or  mechanical  in  nature  ^l.e.,  electrical  power  control 
and  flight  control,  etc.)  the  AF  Digital  Avionics  Study  has  proposed  to  change  the  def- 
inition of  "avionics"  to  "airborne  e 1 ec t ron  1 cs " . This  is  why  processors,  software  and 
multiplexing  are  appearing  in  so  many  new  disciplines.  Because  not  all  the  electronics 
for  these  disciplines  are  always  procured  simultaneously,  much  proliferation  can  occur 
In  the  air  vehicle  hardware.  That  is  why  multiplexing  Is  so  readily  accepted.  It 
gives  the  systems  engineer  a building  block  approach  to  system  design  and  permits  stan- 
dardization throughout  the  total  weapons  system.  Subsystems  not  yet  available  can 
easily  be  simulated  and  then  substituted  later  when  delivered,  l.e.,  systems  integration 
Is  no  longer  delayed  until  all  parts  are  delivered. 

Another  benefit  derived  from  the  MUX  standard  1s  that  it  permits  one  to  establish 
an  In-house  systems  engineering  capability.  In  order  to  get  a better  handle  on  avionics, 
one  can  investigate  the  feasibility  of  performing  systems  analysis  in-house,  thus  ac- 
complishing vour  own  top  1 e ve  1 / f unc  1 1 ona  1 .systems  definition  and  architecting.  This 
type  of  system  simulation  allows  one  to  Investigate  the  data  flow  correctness  and  systems 
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performance  based  on  data  accuracy,  timing  and  availability.  Redundancy  and  graceful 
degradation  can  be  realistically  simulated  to  find  out  critical  parameters  and  system 
sensitivity  to  various  signals  and  their  accuracy.  This  approach  to  systems  integra- 
tion /archl  t ec  t Ing  will  reduce  cost  by  avoiding  over-specification  of  equipment,  it  re- 
duces risk  through  early  systems  simulation  prior  to  specification  writing,  and  it  in- 
creases reliability  through  performing  failure  analysis  and  applying  well  defined  built 
In-test  (BIT)  and  graceful  degradation/redundancy  schemes. 

Other  benefits  that  multiplexing  provides  can  be  realized  in  system  retrofits  and 
in  the  maintenance  and  logistics  area.  Systems  retrofits  are  more  easily  accomplished 
because  of  the  standard  Interfaces  of  subsystems  and  time  sharing  of  aircraft  wiring 
(l.e.,  MUX  Bus).  In  the  maintenance  area,  the  data  bus  enables  dynamic  testing  of  on- 
board equipment  and  standard  ground  support  equipment.  In  the  logistics  area  the  prime 
benefit  Is  fewer  Inventory  parts  (at  all  levels)  and  Increased  reliability  of  the  hard- 
ware . 


In  summary,  to  reiterate,  by  first  standardizing  the  multiplexed  data  bus  CMIL- 
STD-1553A),  and  following  it  up  with  similar  standards  chat  are  developed  for  reducing 
proliferation  in  computers  and  software,  the  following  overall  benefits  can  be  gained: 

1.  Systems  Integratlon/archltectlng  will  be  simplified. 

2.  Standardization  of  interfaces,  hardware  and  software  will  reduce  proliferation 

3.  Retrofit  and  maintenance  of  digital  avionic  systems  will  become  easier. 

4.  Reliability  of  hardware  will  be  Increased  and  the  resultant  reduction  of  in- 
ventory equipment,  parts  and  documentation  will  reduce  life  cycle  costs. 

5.  There  will  be  more  international  competition  in  avionics  hardware/software 
because  of  the  standard/compatible/lnterchangeable  sensors  that  can  be  developed  off- 
line . 


Therefore,  It  Is  felt  that  even  though  the  multiplex  data  bus  in  only  a portion  of 
the  overall  digital  avionics  concept,  It  has  laid  the  cornerstone  for  the  concept  by 
providing  a standard  integration  concept  with  standard  equipment  Interfaces  that  is 
reconf Igurable  and  technology  Independent  and  as  a result  the  many  benefits  listed  in 
this  paper  can  be  realized. 
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SUMMARY 


This  paper  discusses  highly  reliable  fault-tolerant  computer  systems  for  use 
in  flight-critical  avionic  and  control  systems  of  future  commercial  transport 
aircraft.  Such  aircraft  are  envisioned  to  have  integrated  systems,  to  be  termin- 
ally configured,  and  to  be  equipped  with  fly-by-wire  flight  control  systems,  all 
of  which  require  highly  reliable,  fault-tolerant  computers.  Two  candidate 
computer  architectures  are  identified  as  having  the  potential  of  satisfying  the 
commercial  transport  aircraft  requirements: 

• Software- Implemented  Fault-Tolerance  (SIFT)  by  Stanford  Research 
Institute  (SRi) 

• Parallel -Hybrid  Redundant  Multiprocessor  by  Charles  Stark  Draper 
Laboratory  ( CSDL ) . 

The  system  context  of  such  computers  indicates  the  need  for  some  central 
complex  that  assures  system  collaboration  and  survival,  which  is  the  intended 
use  of  the  SRI  and  CSDL  concepts.  The  most  stringent  requirement  imposed  on 
these  computers  is  that  the  probability  of  failure  should  be  less  than  10*9 
for  a flight  duration  of  ten  hours.  Descriptions  of  the  SRI  and  CSDL  concepts 
are  presented  and  the  most  critical  design  issues  are  discussed.  Reliability 
estimates  are  presented  for  the  SRI  and  CSDL  concepts  showing  their  ability 
to  meet  this  stringent  requirement . 


INTRODUCTION 

NASA,  in  the  past  few  years,  has  considered  the  application  of  advanced  concepts  of  aerodynamics, 
terminal  area  control,  and  flight  control  [l,  2,  3]  to  commercial  transport  aircraft.  Many  of  these 
concepts  have  shown  a need  for  sophisticated  avionic  and  control  systems  employing  computers  with  some 
of  the  applications  having  safety-critical  requirements.  An  initial  study  [U,  5]  vas  conducted  to 
consider  the  design  of  computer  architectures  to  include  in  particular: 

• The  computational  and  reliability  requirements  of  an  advanced  transonic  commercial  transport 
aircraft  using  fly-by-wire  techniques  with  a unified  digital  computing  system 

• The  impact  of  modern  digital  circuit  technology  on  the  design  of  such  a computer 

• The  identification  of  candidate  architectures  for  a computer  to  satisfy  the  requirements. 

From  this  effort,  two  candidate  architectures  were  identified  as  having  the  potential  of  satisfying 
the  commercial  transport  aircraft  requirements: 

• A newly  conceived  architectural  concept.  Software  Implemented  Fault  Tolerance  (SIFT)  by  Stanford 
Research  Institute  (SRI) 


• An  existing  architectural  concept,  a Parallel -Hybrid  Redundant  Multiprocessor  by  Charles  Stark 
Draper  Laboratory  (CSDL). 

Additional  efforts  are  being  conducted  whose  objectives  are: 

• To  develop  the  SRI  and  CSDL  design  concepts  to  a point  at  which  their  potential  reliability  may 
be  evaluated  with  reasonable  accuracy 

• To  investigate  alternate  strategies  for  physical  implementation  using  available  or  specially 
designed  components 

• To  prove  or  substantiate  the  correctness  of  the  hardware  and  software  designs 

• To  model  the  systems  and  evaluate  their  effectiveness  in  tolerating  faults. 
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To  achieve  the  above  goals,  the  current  research  was  directed  at  the  critical  aspects  of  the 
designs,  leaving  less  critical  aspects  to  a later  phase  in  the  research  program.  Since  this  is  an 
ongoing  research  investigation,  the  design  and  specification  of  the  SRI  and  CSDL  systems  are  not  complete. 

This  paper  discusses  the  context  of  the  system  within  which  the  ccmiputer  functions,  the  most  strin- 
gent requirement  imposed  on  such  computers,  provides  a description  of  the  SRI  and  CSDL  computer  concepts 
and  discusses  their  ability  to  meet  the  most  stringent  requirements. 


SYSTEM  CONTEXT 


For  contemporary  transport  aircraft  systems,  redundancy  occurs  in  the  form  of  independent  and  often 
dissimilar  subsystems,  as  shown  in  Figure  1.  Subsystem  failure  is  a routine  occurrence  and  in  most 
cases  does  not  have  a catastrophic  impact  on  flight  safety.  To  implement  systems  in  this  manner,  the 
flight  crew  is  employed  as  a system  integrator  with  ultimate  responsibility  for  failure  detection, 
identification  and  recovery  for  the  vehicle. 

The  implementation  of  substantially  more  sophisticated  and  autonomous  controls  anticipated  for 
future  transport  aircraft  calls  for  capabilities  that  require  the  creation  of  an  integrated  system.  It 
is  appropriate  to  regard  the  aircraft  avionics  and  controls  as  a system  of  sensors  and  effectors  where 
information  processing  elements  derive  inputs  from  sensors  and  generate  control  signals  to  effectors. 
Effectors,  which  are  displays  and  actuators,  operate  upon  the  man^achine  environment  via  dynamics  and 
human  responses  thus  producing  effects  that  are  measured  by  sensors. 

With  the  future  incorporation  of  such  life-critical  functions  as  active  controls,  total  fly-by-wire, 
and  system  management,  the  system  embraces  both  non-critical  subsystems,  critical  subsystems  and 
critical  self  survival  of  the  system.  A significant  aspect  of  the  system  architecture,  therefore, 
involves  the  assignment  of  redundancy  and  survivability  to  the  portions  of  the  system.  Survival  of  the 
aircraft  will  require  the  survival  of  minimum  levels  of  sensors  and  effectors,  motive  power,  structural 
integrity,  and  the  information  processing  system.  Survival  of  the  information  s^stjem  will  require  not 
only  the  survival  of  minimum  levels  of  processing  for  sensors  and  effectors  but  also  will  further  require 
the  survival  of  system  irtegrity,  which  is  the  unambiguous  successful  collaboration  of  surviving  modules. 
Therefore,  the  survival  of  the  aircraft  depends  not  only  on  the  survival  of  a suitable  minimum  subset 
of  the  sensors  and  effectors  but  also  on  the  ability  of  the  system  to  "mobilize"  them  into  coordinated 
action.  Such  mobilization  requires  redundancy  management  wherein  the  vehicle  dynamics  are  controlled 
via  surviving  effectors  on  the  basis  of  information  derived  from  surviving  sensors. 

Since  the  occurrence  of  unlikely  events  can  never  be  ruled  out,  the  design  of  such  a system  must 
resort  to  probabilistic  criteria.  Without  going  into  detail,  we  can  identify  broad  characterizations 
such  as  Mean  Time  Between  Failure  (MTBF),  failure  independence  (i.e.,  absence  of  correlation  among 
failures  in  distinct  subsystems),  and  robustness  in  the  sense  of  recovering  from  all  failures  for  which 
it  might  potentially  be  possible  to  recover.  The  last  of  these  items  cannot  be  directly  quantified. 

It  requires  the  anticipation  of  all  permutations  of  failures  that  will  occur  in  the  lifetime  of  a fleet 
of  systems  and  the  creation  of  "contingent  modes"  of  operation. 

The  desirable  characteristics  mentioned  in  the  preceding  paragraph  for  systems  and  subsystems 
suggest  the  following  design  guidelines  for  critical  integrated  systems.  First,  sensors  and  effectors 
should  be  diversified  to  the  point  where  failure  correlations  are  adequately  small.  Diversification 
should  not  be  excessive,  however,  as  it  is  counterproductive  to  the  second  guideline,  modularity. 
Modularity  means  the  use  of  identical  modules  for  multiple  functions  as  well  as  red'ondancy  and  serves 
to  fulfill  MTBF,  logistics,  and  contingency  requirements.  Modularity  is  beneficial  as  it  tends  to 
minimize  the  variety  of  contingency  modes. 

The  highly  survivable  system  could  in  principle  be  located  in  a central  complex  with  dedicated 
connections  to  each  sensor  and  effector.  As  a practical  matter,  a certain  degree  of  distribution  of 
the  system  is  inevitable.  Most  of  the  sensors  in  use  today,  as  well  as  those  envisioned  for  the  future, 
rely  on  electronics  for  manipulation,  interpretation,  and  testing.  A good  example  of  a future  effector 
is  the  development  of  more  fuel  efficient  engines  through  the  use  of  ccmputers  as  engine  controllers. 

In  times  past,  the  notion  of  system  integration  carried  the  implication  of  using  a single  large  computer 
on  a shared  basis  for  as  much  of  the  processing  as  possible.  Rather  than  trying  to  push  information 
processing  towards  a central  facility,  the  tendency  today  is  to  take  advantage  of  the  potential  for 
distribution.  This  means  not  only  the  employment  of  specialized  electronics  for  each  subsystem  but 
also  the  use  of  an  appropriate  amount  of  digital  computer  processing  local  to  and/or  dedicated  to  the 
subsystem. 

System  integrity  can  in  principle  be  embodied  in  a wholly  distributed  system  to  some  degree; 
however,  incorporation  of  adequate  redundancy  and  contingency  for  critical  self-survival  without  possible 
detrimental  effects  from  the  non-critical  subsystems  poses  significant  survival  problems.  Therefore, 
the  subject  system  architecture  would  employ  a fault-tolerant  central  computer  for  the  maintenance  of 
system  integrity.  The  adoption  of  a fault-tolerant  central  computer  into  the  system  structure  makes  it 
possible  to  assign  both  the  redundancy  management  and  the  contingencies  in  an  unambiguous  way  to  a 
hardware-software  entity  whose  failure  and  nonavailability  are  highly  improbable. 

With  this  system  approach,  the  computer  also  serves  those  digital  computation  functions  that  involve 
the  coordination  of  separate  subsystems  which  are  both  critical  and  non-critical.  Between  the  central 
computer  and  the  local  processors  is  a data  coram'-inication  facility  which  can  be  composed  on  non-critical 
elements,  but  whose  partial  survival  is  critical.  Representative  system  structures  are  shown  in 
Figure  2,  Three  different  fundamental  fault-tolerant  data  communications  structures  can  be 
distinguished.  They  are  the  dedicated  or  star  connection  of  Figure  2a,  the  bus  connection  of  Figure  2b, 
and  the  network  connection  of  Figure  2c  (6].  Each  approach  has  pro  and  con  characteristics  which 
will  not  be  treated  here  in  detail. 
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It  may  be  presiomed  that  all  three  of  these  methods  will  employ  serial  data  transmission  and  that 
the  interface  to  each  subsystem  will  be  comparable  for  the  different  methods.  The  interface  at  the 
central  computer's  end  is  virtually  identical  for  the  three  systems  differing  only  in  the  number  of 
input-output  access  channels  needed.  There  is  no  reason  why  all  three  of  these  methods  cannot  be  used 
in  different  parts  of  the  same  system. 

Within  the  system  context  previously  described,  the  fault-tolerant  central  computer  can  be 
characterized  as  having  the  primary  function  of  simply  surviving  with  secondary  functions  of  system 
redundancy  management  and  contingency  management.  In  addition  to  these  functions,  the  central  computer 
has  other  natural  system  roles  owing  to  its  hierarchical  position  in  the  system  structure.  These  roles 
include  the  coordination  of  sensor-effector  activities  of  which  a digital  autopilot  is  an  excellent 
example.  In  some  cases  this  function  alone  can  account  for  a substantial  fraction  of  the  computing 
resources.  Another  role  is  that  of  command  where  there  may  be  interaction  with  the  flight  crew. 


REQUIRl^DJiTS 

A number  of  studies  [5,  T]  have  been  conducted  to  determine  the  requirements  placed  upon  a computer 
for  use  in  future  commercial  transports.  The  purposes  of  these  studies  were  to  develop  estimates  of  the 
computational  requirements  imposed  by  contemporary  and  projected  future  avionics  and  aircraft  functions 
and  also  to  identify  the  safety  levels  of  the  computations.  These  estimates  are  intended  to  be  used  for 
determining  the  computational  size  and  power  required  of  the  digital  computer,  to  specify  other  con- 
straints such  as  reconfiguration  time,  and  to  aid  in  developing  a specification  for  the  reliability 
of  the  computer. 

Table  I is  a siimmary  of  the  present  and  projected  future  aircraft  functions  and  their  safety 
criticality  levels  by  flight  phase.  The  flight  phases  constitute  the  major  operational  modes  for  which  a 
computer  must  provide  task  allocation  and  scheduling  among  the  processors  and  memories.  Hot  shown  by 
Table  I are  the  additional  tasks  required  for  system  redundancy  and  contingency  management,  and  computer 
self  survival.  The  details  of  the  tasks  will  not  be  presented  here;  however,  there  are  several  important 
characteristics  of  the  aircraft  function  task  set.  First,  it  should  be  recognized  that  different  aircraft 
will  place  different  computational  loads  upon  a computer.  Second,  the  set  of  tasks  to  be  computed  varies 
in  the  amount  of  computation  and  in  the  speed  in  which  it  must  be  carried  out.  Third,  all  of  the 
computational  tasks  are  repetitive  in  nature.  Fourth,  the  tasks  that  must  be  carried  out  fastest  are 
tasks  with  small  programs  and  data  sets.  Fifth,  for  a typical  task  set  the  amount  of  data  flow  from  one 
task  to  another  is  low. 

Typically,  a central  computer  might  have  placed  upon  it  by  the  aircraft  functions  the  following 
computational  and  sizing  requirements: 

• Processor  Speed:  200  - 500  thousand  instructions  per  second 

• Memory  Size:  16  - 2k  thousand  words. 

From  Table  I,  it  is  seen  that  the  highest  safety  levels  are  imposed  on  projected  future  functions  for 
stability  and  control.  These  safety  levels  are  governed  by  the  Federal  Aviation  Administration 
regulations  for  the  design  of  flight  control  systems  and  other  equipment,  systems  and  installations  in 
commercial  transports,  and  the  safety  level  is  stated  as  "The  occurrence  of  any  failure  condition  which 
would  prevent  the  continued  safe  flight  and  landing  of  the  airplane  is  extremely  improbable",  a nuir.ber 
of  less  than  or  equal  to  1 x 10”^  has  been  imposed  in  recent  certification  programs  to  represent  an 
extremely  improbable  event.  However,  the  application  of  +he  10"9  number  to  a system  or  subsystem  is 
subject  to  interpretation  and  is  usually  examined  on  an  individual  basis  by  the  FAA.  For  research 
purposes,  the  reliability  requirement  for  the  SRI  and  CSDL  computers  was  established  as  the  probability 
of  system  failure  should  be  less  than  10”^  for  the  longest  flight  duration  of  ten  hours.  This  is  the 
most  stringent  interpretation  that  could  be  anticipated  and  places  a high  availability  requirement  on 
the  computer. 

The  architectxire  of  the  central  computer  is  strongly  driven  by  the  urgency  of  preserving  a valid 
data  stream.  Failure  to  do  so  could  have  catastrophic  consequences  such  as  loss  of  the  aircraft  state 
vector,  loss  of  configuration  data,  or  loss  of  command.  For  this  reason,  data  that  is  transferred, 
processed,  or  stored,  is  manipulated  in  replica.  This  data  replication  in  combination  with  the  task 
set  requires  a high  degree  of  parallelism  in  the  computer.  Also,  the  potential  task  set  is  large  and  in 
the  extreme  case  places  a requirement  for  high  computer  productivity.  Finally,  to  meet  the  different 
application  possibilities,  a requirement  ‘or  expansion  and/or  contraction  of  the  computer  was  established. 

There  is  no  unique  parallel  computer  design.  It  can  be  a multiprocessor,  a multicomputer,  or  a 
combination  of  the  two.  Additionally,  numerous  architectural  alternatives  exist.  However,  motivations 
for  multiprocessors  are  typically  to  increase  productivity  and  availability  at  the  same  time  although 
these  two  purposes  are  largely  competitive.  Parallelism,  productivity,  availability,  and  expandability 
are  Intrinsic  to  the  multiprocessor  and  through  a previous  study  [U,  5],  multiprocessors  have  shown  the 
potential  of  satisfying  the  commercial  transport  aircraft  requirements. 


SOFT^ARr:- IMPLEMENTED  FAULT-TOLERAIfCE 

In  recent  years,  a number  of  fault-tolerant  architectures  (3,  9»  10,  11]  have  been  devised,  analyzed, 
and  in  some  cases,  implemented.  Most  of  these  architectures  depend  heavily  on  special  hardware  structures 
to  achieve  their  fault-tolerance.  Vfhile  hardware  mechanisms  are  often  useful  for  certain  aspects  of 
error  detection  and  correction,  they  are  limited  in  the  Kinds  of  faults  they  can  treat.  Also  such 
mechanisms  cannot  be  easily  modified  to  reflect  changes  in  performance  and  reliability  requirements. 
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The  SIFT  (Software- Implemented  Fault-Tolerance)  computer  [4,  12]  is  founded  on  a new  approach  to 
fault-tolerant  computing  that  puts  strong  emphasis  on  software  techniques  for  achieving  reliability  with 
corresponding  de-^mphasis  on  special  hardware.  The  software  that  is  critical  to  the  reliability  of  the 
system  is  designed  in  accordance  with  a hierarchical  design  methodology  [13,  lU,  15]  that  allows  one  to 
state  and  prove  properties  relating  to  the  system's  correctness.  A Markov  process  model  is  used  to 
analyze  SIFT's  reliability  as  a function  of  various  error-detection  and  reconfiguration  strategies. 

The  reliability  model  is  incorporated  into  SIFT's  formal  description  permitting  one  to  show  that  the 
model  indeed  reflects  the  behavior  of  the  system. 

The  SIFT  Design 

This  section  will  describe  the  SIFT  computer  design,  the  manner  of  its  operation  and  also  the  way 
in  which  the  principal  parameters  of  the  design  were  derived  from  the  requirements  and  the  technological 
factors . 

The  major  units  of  the  computer  are  shown  in  Figure  3*  The  central  processing  units  and  associated 
memories  are  indicated  by  P and  M respectively.  The  Input/Output  processors  and  their  associated 
memories  are  indicated  by  and  The  P and  M units  form  a computing  module  with  the 

connection  between  them  in  a conventional  manner  by  a high  bandwidth  bus.  This  connection  enables  the 
processor  to  obtain  data  and  instructions  and  to  place  data  and  results  in  the  memory  unit.  The 
computing  modules  are  connected  to  each  other  by  a system  of  several  busses  (in  Figure  3,  three  such 
busses  are  shown,  but  this  number  can  be  varied).  These  intermodule  busses  are  unidirectional  in  that 
a module  may  only  read  data  from  another  module  and  may  not  write  into  the  m«nory  of  another  module. 

This  restriction  prevents  a faulty  module  from  having  an  adverse  effect  on  operative  modules  thereby 
providing  fault  isolation  between  modules.  The  input/output  subsystems  (i.e.,  the  P^^  and  units) 

is  also  connected  to  the  intermodule  busses  and  its  operation  is  described  in  the  following  discussion. 

Computational  tasks  are  placed  in  modules  with  tasks  of  high  criticality  being  replicated  in 
several  modules  while  less  critical  tasks  are  placed  in  a lesser  number  of  modules.  This  is  illustrated 

in  Figure  U within  each  module  the  tasks'  are  multiprograramed  according  to  a fixed  schedule.  No  attempt 

is  made  to  carry  out  tight  synchronization  of  the  modules.  It  is  merely  necessary  that  the  synchroniza- 
tion is  sufficient  for  the  timing  constraints  of  the  application  and  for  the  error  detection  and 

correction  strategies  that  are  used. 

Within  each  module  there  is  a local  executive  that  controls  all  the  operations  within  that  module. 

The  local  executive  includes  scheduling,  dispatching,  error  detection,  error  correction  and  error 
repoiting.  The  coordination  of  all  the  modules  is  carried  out  by  a Global  Executive  that  is  placed  in 
only  some  of  the  modules.  This  global  executive  is  responsible  for  reconfiguration  and  the  associated 
diagnosis  and  check-out  functions  that  must  be  carried  out  in  the  event  of  a fault,  or  when  there  must 
be  a change  in  the  task  set  to  be  computed,  such  as  occurs  during  change  of  flight  phase  of  the  aircraft. 
This  global  executive  is  replicated  in  a sufficient  number  of  nodules  to  achieve  the  necessary' 
reliability  of  the  system  as  a whole. 

The  basic  scheme  for  error  detection  and  correction  is  to  carry  out  each  calculation  in  a number  of 
modules  and  for  the  processor  of  each  module  to  place  the  results  in  its  own  memory.  The  results  are 
validated  at  the  time  that  they  are  used  rather  than  when  they  are  computed.  Any  task  that  uses  the 
result  of  some  task  (including  possibly  the  next  iteration  of  that  same  task)  reads  the  several  versions 
of  the  data  it  needs  and  carries  out  a comparison  of  the  several  versions.  If  all  versions  are  the  same, 
then  the  presumption  is  that  no  error  has  occurred  and  the  calculation  proceeds.  This  sequence  of 
operations  is  illustrated  in  Figure  5*  If  one  of  the  versions  of  the  data  is  fo’ond  to  be  different  then 
it  is  presumed  that  it  is  in  error.  In  this  event,  the  majority  vote  of  the  versions  of  the  data  is 
used  and  the  calculation  proceeds  after  taking  note  of  the  fact  that  a data  discrepancy  exists.  This 
note  like  any  other  data  is  written  into  the  memory  of  the  module  that  detected  the  discrepancy.  At  some 
later  but  short  time,  the  next  repetition  of  the  global  executive  will  read  the  notes  left  by  the  other 
modules  that  report  errors  that  they  have  detected.  The  global  executive  then  initiates  a diagnostic 
program  that  identifies  the  unit  that  is  faulty  based  upon  the  error  reports  that  the  global  executive 
has  read  from  all  the  modules. 

In  the  event  that  this  diagnosis  perceives  that  a processing  module  is  faulty,  reconfiguration  takes 
place  to  remove  that  unit  from  further  effect  on  the  system.  The  reconfiguration  takes  place  according 
to  the  following  scheme. 

1.  The  global  executive  receives  error  reports  from  the  several  local  executives. 

2.  A diagnostic  routine  is  started  which  identifies  the  processing  unit  that  is  at  fault. 

3.  The  global  executive  detennines  those  changes  in  task  schedules  that  must  be  carried  out  to 
provide  adequate  redundancy  for  the  task  set.  These  changes  are  noted  in  tables  that  are  in 
the  memories  of  the  processing  modules  that  are  computing  the  global  executives. 

U.  The  local  executives  in  each  of  the  modules  (including  possibly  the  faulty  module)  read  the 

tables  of  schedules  from  the  several  versions  of  the  global  executives  and  on  the  basis  of 

these  tables  change  the  task  set  on  which  they  are  computing. 

The  global  executives  referred  to  above  are  replicated  versions  of  the  same  program  and  as  such  they 
should  all  produce  the  same  tables  of  schedules.  In  the  event  that  one  of  the  global  executives  is 

running  in  the  processing  module  that  is  at  fault,  then  the  possible  #*rroneous  results  from  that  version 

will  be  ignored  by  the  voting  process  that  takes  place  whenever  a module  reads  data  from  any  other  module. 
Thus,  the  reliability  mechanisms  that  are  used  for  the  global  executives  are  the  same  as  those  used  f-:r 
all  other  task  programs  and  the  global  executive  does  not  form  a .special  "hard  core"  that  needs  ar.y 
special  or  different  treatment  to  achieve  reliable  operation. 
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'fhe  inter-module  busses  are  used  for  communication  between  modules,  ’hhen  data  has  to  be  read  from 
other  modules,  the  bus  used  for  each  version  of  the  data  will  be  chosen  so  ti.at  a different  bus  is  used 
for  each  version  of  the  data.  One  faulty  bus  will  therefore  not  produce  erroneous  results  in  the  system 
because  the  voting  process  in  each  module  will  over-ride  the  non-agreeing  data  item.  The  error  reports 
that  are  left  by  a module  for  the  global  executive  to  read  will  contain  notes  about  the  busses  that  were 
used  for  data  transfer  of  the  non-agreeing  data  item.  On  the  basis  of  these  data,  the  diagnostic  program 
in  the  global  executive  can  decide  that  it  is  a bus  that  is  faulty  rather  than  a processing  module.  The 
faulty  bus  is  removed  from  further  use  by  changing  the  tables  that  designate  which  busses  are  to  be  used. 
These  tables  in  the  global  executive  are  read  by  the  local  executives  which  then  chaiige  their  owr.  bus 
utilization  tables  thereby  ceasing  to  use  the  faulty  bus. 

Many  fault-tolerant  computer  designs  that  use  replication  to  achieve  fault-detection  and/or  fault- 
correction  demand  that  the  various  versions  of  a computation  of  a task  be  carried  out  at  identically  the 
same  time  (so  called  '*lock-step”  operation).  This  implies  that  a system-wide  clock  of  high  reliability 
be  included.  In  SIFT,  this  is  avoided  by  only  demanding  a loose  form  of  synchronization  derived  from 
the  timing  constraints  of  the  task  set.  The  basic  synchronization  rule  is  that  no  task  may  commence  an 
iteration  of  its,  computation  before  all  the  other  replications  of  that  task  have  completed  the  previous 
iteration.  This  means  that  at  any  pime  the  results  of  a computation  are  always  available  in  the  system 
for  some  recent  iteration.  In  order  for  this  scheme  to  work,  it  is  necessar>^  for  each  computation  to  use 
two  buffers  for  the  results.  While  one  buffer  contains  the  complete  results  of  one  iteration,  the  other 
buffer  is  being  filled  with  the  results  of  the  next.  The  synchronization  rule  quoted  above  ensures  that 
the  first  buffer  will  not  be  overwritten  until  the  next  iteration  completes.  The  use  of  this  loose  form 
of  synchronization  has  benefits  in  that  a system-wide  transient  of  short  duration  will  be  unlikely  to 
affect  all  of  the  processing  modules  in  the  same  way  thereby  preventing  the  occurrence  of  multiple 
correlated  errors  which  cannot  be  detected  by  simple  replication  and  voting. 

The  interconnection  between  the  modules  and  the  busses  can  be  by  a high  impedance  connection  so  that 
damage  propagation  from  one  type  of  unit  to  another  can  be  achieved.  It  is  also  feasible  to  use  an 
optical  coupling  between  units  as  a means  of  preventing  damage  propagation.  It  must  be  saown  in  a fault- 
tolerant  computer  that  a faulty  unit  cannot  disrupt  the  system  by  any  actions  that  it  takes.  The 
software  voting  that  is  used  in  the  SIFT  design  provides  such  protection  for  all  computational  errors. 
However,  it  is  also  necessary  to  show  that  control  actions  by  one  unit  cannot  cause  disruption  of  other 
units.  All  units  of  the  system  are  constrained  from  forcing  other  units  to  carry  out  an  action.  They 
may  only  request  action  from  other  units,  and  the  other  unit  protects  itself  by  only  acting  on  requests 
that  it  can  carry  out  without  disruption  of  its  own  operation.  An  example  of  this  is  the  fact  that  a 
module  may  only  read  from  another  module's  memor>'  and  may  not  write  into  it.  One  form  of  disruption  would 
occur  if  a faulty  unit  made  excessive  requests  upon  another  unit  thereby  preventing  it  from  carrying 
out  other  actions  including  the  serving  of  requests  from  yet  other  units.  This  is  prevented  by  logic  in 
each  unit  that  scans  all  units  that  request  service  and  only  honoring  this  request  for  a small  amount  of 
time  before  continuing  to  scan  the  other  units.  This  mechanism  exists  in  two  places.  First,  a bus  will 
only  serve  a requesting  processor  for  a one-word  transfer  before  checking  whether  other  processors  also 
require  service.  Second,  a memory  unit  within  a module  will  only  transfer  one  word  to  a bus  before 
serving  other  busses  that  may  request  service.  This  mechanism  ensures  that  a unit  that  becomes  faulty 
cannot  prevent  other  units  from  serving  other  requests.  A:  this  request  for  data  is  the  only  control 
function  that  spans  more  than  one  unit,  the  design  provides  fault  isolation  across  units. 

The  input/output  subsystem  is  connected  to  the  bus  system  as  shown  in  Fig-ure  3.  The  fault -tolerance 
techniques  that  are  used  are  as  follows: 

• Critical  sensors  are  replicated,  and  the  programs  that  require  the  data  read  all  the  versions  and 
carry  out  a voting  procedure. 

• Critical  actuators  are  replicated  and  each  of  them  contains  sufficient  local  logic  that  they  can 
read  the  several  versions  of  the  output  data  that  they  require  and  carry  out  local  voting  possibly 
by  mechanisms  similar  to  those  currently  employed  with  multiple  actuators  on  aircraft  (e.g., 
forced  sum  voting). 

• Non-critical  sensors  and  actuators  are  not  replicated  but  are  connected  to  SIFT  in  the  same  way  as 
critical  ones  in  order  to  preserve  the  fault-isolation  rules  on  the  input /output  subsystem  as  are 
used  between  processing  modules. 

For  critical  input  and  output  units,  the  data  to  and  from  the  SIFT  system  flows  on  a multiple  bus 
system  which  is  connected  to  the  main  bus  system  of  SIFT  via  logic  that  is  realized  by  a specially 
programmed  micro-processor  in  Figure  3)» 

Each  microprocessor  operates  in  a similar  manner  to  the  main  processors  of  SIFT  except  tnat  the 
tasks  that  ore  to  be  performed  are  much  smaller  and  the  executive  that  resides  in  them  is  a rc*iuced 
version  of  the  executives  in  the  central  processors.  The  reductions  that  are  made  are  as  follows: 

• No  global  executive  is  present  in  the  microprocessors,  as  the  functions  normally  performed  by  it 
are  either  not  necessary  or  are  carried  out  by  the  global  executive  of  the  central  processors. 

• The  local  executive  contains  only  the  voter,  scheduling,  and  dispatching  functions  that  enable  it 
to  determine  its  schedules  by  reading  the  central  global  executive  tables. 

In  all  other  respects  the  I/O  processors  operate  according  to  the  same  general  rules  as  the  central 
processors.  These  operations  include  voting  on  multiple  inputs  to  achieve  error  detection  and  correction, 
reconfiguration  by  change  of  scheduling  tables,  and  the  restriction  that  a processor  maj'  only  read  data 
from  other  processors  and  may  not  write  into  the  memory  of  other  processors. 
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In  a SIFT  system  that  is  carrying  out  both  critical  and  non-critical  tasKs,  it  is  necessary  to 
maintain  a separation  between  the  tasks  because  the  non-critical  tasks  may  not  receive  as  much  validation 
and  verification  as  the  critical  tasks  and  thus  may  corrupt  them.  This  is  achieved  by  the  use  of  different 

units  that  connect  inputs  and  output  units  to  the  main  bus  system.  The  units  are  used  to  read 

from  sensors  and  deposit  the  results  of  the  read  operation  in  their  own  memories.  These  results  can  then 
be  read  by  the  main  processors  of  SIFT.  This  scheme  effectively  isolates  potentially  unreliable  sensors 
from  the  other  units  of  the  system.  Non-critical  output  units  are  handled  by  an  analogous  scheme  in  which 
Pj^Q  units’  read  from  the  main  memories  and  transmit  data  to  those  actuators. 

The  Design  Methodology 

The  SIFT  design  has  been  specified  in  accordance  with  a formal  design  methodology'  that  originated  with 
D.  Parnas  [l6,  Ij]  and  has  been  extensively  developed  at  SRI  (Robinson  et  al.  [13]).  The  chief  reasons 
for  using  such  a medium  were  (l)  to  impose  a discipline  on  the  design  process  assuring  a clearly- 
structured,  easily  modified  design,  (2)  to  simplify  verification  of  the  correctness  of  that  design,  ana 
(3)  to  facilitate  the  analysis  of  certain  reliability  properties.  Previous  use  of  the  methodology'  has 
been  concerned  with  only  the  first  two  of  these  aims.  The  SIFT  effort  is  the  first  instance  of  its  use 
in  connection  with  reliability  modeling. 

The  methodology  can  be  viewed  as  a formalization  of  Dijkstra's  stepwise  refinement  concept  [15].  The 
central  idea  is  that  of  decomposing  the  design  into  a hierarchy  of  abstract  virtual  machines  or  Farnas 
modules.  The  highest  modules  in  the  hierarchy'  provide  an  abstract,  global  description  of  the  system's 
capabilities.  Modules  at  lower  levels  of  the  hierarchy  serve  as  building  blocks  for  implementing  the 
highest-level  module.  Modules  at  still  lower  levels  are  building  blocks  for  implementing  tnose  at 
intermediate  levels.  The  modules  lying  near  the  top  of  the  hierarchy  thus  tend  to  be  highly  abstract 
while  those  at  or  near  the  bottom  tend  to  be  more  concrete.  In  the  SIFT  design,  for  example,  descriptions 
of  real  machine  hardware  appear  at  the  bottom  level,  and  a set-theoretic  moael  of  the  workings  of  the 
system  appears  near  the  top. 

Each  module  in  the  hierarchy  is  specified  in  terms  of  a set  of  abstract  data  structares  and  a set 
of  operations  that  change  the  values  of  these  structures.  At  any  given  moment,  the  state  of  the  module 
is  determined  by  the  aggregate  of  the  values  of  its  data  structures.  Operation  calls  thus  cause 
transitions  from  one  state  to  another.  The  data  structures  and  operations  of  each  nodule  are  specifieu 
in  a formal  way.  The  specifications  describe  what  happens  when  each  of  the  functions  of  a module  art- 
called.  Specifications  for  operations  consist  of  assertions,  such  as  logical  formulas  that  relate  tr.e 
state  of  the  module  before  an  operation  call,  to  the  state  resulting  from  the  call.  Module  specifica- 
tions have  other  aspects  [19]  that  are  not  directly  relevant  to  this  discussion. 

Each  module  (other  than  those  at  the  very  bottom  of  the  hierarchy)  is  abstractly  implementei  in 
terms  of  those  modules  ly'ing  immediately  beneath  it  in  the  hierarchical  ordering.  First,  a ma: ; ing 
function  is  specified  that  maps  states  of  the  implementing  nodules  to  a state  of  the  implemented 
module.  The  mapping  function  thus  represents  the  data  structure  of  the  implemented  module  in  terms  of 
those  of  the  implementing  modules.  The  selection  of  mapping  functions  for  an  implemented  nodule 
corresponds  to  deciding  on  the  data  structures  for  that  module.  Next,  for  each  operation  of  the 
implemented  module,  am  abstract  program  is  specified.  The  abstract  program  is  expressed  in  terns  of  the 
data  structure  and  operations  of  the  implementing  modules  and  is  intended  to  mimic  calls  of  the 
implemented  function  as  a sequence  of  calls  to  operations  in  the  lover-level  modules. 

Two  types  of  proof  can  be  carried  out  according  to  the  methodology.  First,  the  modules  that  con- 
stitute the  top  level  in  the  hierarchy  contain  the  interface  functions,  i.e.,  the  functions  that  the 
system  provides  to  users.  Based  only  on  the  specifications  of  these  modules  certain  properties  of  the 
system  can  be  deduced.  Second,  the  implementation  of  the  modules  can  be  proven  correct  level-by-level . 

Each  abstract  program  is  proven  correct  with  respect  to  the  specifications  of  the  oierations  it 
implements  and  with  respect  to  the  appropriate  mapping  function.  Using  the  methodology',  the  proof  of  a 
large  system  can  be  reduced  to  that  of  many  small  programs. 

Structure  of  SIFT  Software 


The  logical  structure  of  SIFT  is  depicted  in  Figure  6.  It  consists  of  a hierarchical  layering  of 
modules,  which  we  designate  as  system  modules,  and  some  programs — namely  the  application  tasks  and  the 
global  and  local  executives — that  utilize  the  facilities  of  the  external  interface  of  the  system  modules. 
For  simplicity  we  will  say  that  the  tasks  call  the  functions  of  the  interface . Each  of  the  system 
modules  may  be  considered  as  an  abstract  machine  that  maintains  a state  and  provides  operations  to 
modify  the  state.  The  application  tasks  and  executive  may  then  be  considered  as  programs  that  r'on  on 
the  abstract  machines.  The  data  required  by  the  tasks  is  distributed  amoniZ  the  system  modules. 

The  structure  shown  in  the  Figure  6,  with  a few  exceptions,  appears  in  each  of  the  processors. 

Each  task,  including  the  global  executive  executes  in  some  subset  of  the  processors.  Thus  the  fault 
status  and  fault  schedules  modules,  which  are  accessed  only  by  the  global  executive,  appear  only  in 
processors  executing  the  global  executive.  Naturally,  the  state  of  a module  at  a given  instance  varies 
according  to  the  processor  with  which  it  is  associated. 

Figure  6 depicts  some  additional  inputs — clock-tick,  timer  and  faults.  These  are  operations  of 
particular  modules  that  can  be  viewed  as  being  "called"  by  processes  that  are  external  to  the  system 
and  operate  synchronously  with  the  processing  of  tasks. 

Reliability  Model  of  SIFT 

In  oui-  discussion  of  the  design  methodology,  it  was  noted  that  the  top-level  modules  jr- vide  a 
complete  external  description  of  the  capabilities  of  the  system.  Froporties  asserting  the  rrectners 
of  the  system  are  stated  and  proved  relative  to  the  specification  of  these  modules  alone.  Die  otr.vr 
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modules  in  the  design  have  no  other  purpose  than  to  facilitate  the  implementation  of  the  top-level 
module.  The  proof  of  correctness  of  the  implementation  is  important  only  in  that  it  guarantees  that 
the  specifications  of  the  top-level  module  are  satisfied. 

These  specifications,  of  course,  relate  only  to  the  functional  behavior  of  the  system— not  to  its 
reliability  properties.  The  reliability  model,  on  the  other  hand,  describes  the  probability  of  certain 
failures  without  any  particular  reference  to  the  functioning  of  the  system  in  the  event  of  such  failures. 
Clearly,  the  meaningfulness  of  the  reliability  model  depends  heavily  on  this  behavior.  In  order  to  have 
any  confidence  in  the  applicability  of  the  model,  it  is  necessary  to  demonstrate  that  Its  various  states 
do  in  fact  correspond  to  the  appropriate  states  of  the  top-level  module  of  the  design.  In  particular,  it 
must  be  proved  that  the  sequence  of  events  that  lead  to  the  Fail  State  in  the  reliability  model  exactly 
correspond  to  sequences  of  events  that  lead  to  failure  in  the  top-level  module  of  the  system. 

To  facilitate  this  proof,  the  reliability  model  is  formulated  as  an  integral  component  of  the 
hierarchical  design.  A new  module,  called  the  reliability  module,  is  positioned  above  the  former  top- 
level  module  replacing  that  module  as  the  most  abstract  description  of  the  system.  The  new  module's 
only  data  structure  encodes  the  state  of  the  reliability  model  and  its  operation  models  failure  events 
and  reconfigurations. 

In  the  use  of  the  SIFT  system  for  aircraft  control,  the  significant  states  of  the  system  will  depend 
on  changes  of  flight  phase  and  on  any  faults  that  have  occurred.  The  dynamic  nature  of  the  fault- 
tolerance  techniques  that  ore  used  (e.g.,  varying  task  replication  at  varying  times)  make  it  important  to 
analyze  these  different  system  states.  For  each  state  of  the  system,  the  probability  of  transition  to 
other  states  is  a function  of  the  variables  that  describe  the  state  presently  occupied,  e.g.,  number  of 
remaining  good  processors.  This  model  of  the  system  behavior  is  particularly  attractive  because  of  its 
close  parallel  to  the  operation  of  the  real  system  and  also  because  of  its  mathematical  tractability . 

Calculations  based  on  this  reliability  model  show  (Figure  7)  that  a FTFT  system  with  five  pro- 
cessors and  four  busses  initially  would  have  less  than  10”9  probably  of  failing  to  have  sufficient 
computing  resources  at  end  of  a ten-hour  flight.  This  meets  the  required  reliability  objectives  as 
previously  discussed. 

Concluding  Remarks 

In  the  SIFT  design,  the  major  parameters  are  derived  from  the  requirements  and  from  the  opportunities 
that  are  presented  by  recent  technological  advances. 

The  use  of  software  for  error  detection  and  correction  is  possible  because  the  loose  connection 
between  tasks  implies  that  the  amount  of  data  that  must  be  moved  from  task  to  task  is  small.  Software 
reconfiguration  is  also  possible  because  the  time  to  accomplish  it  is  acceptable  when  viewed  from  the 
time  constraints  on  the  tasks  and  the  fact  that  the  fastest  tasks  tend  to  be  small.  This  allows  for 
the  movement  of  complete  tasks  from  one  module  to  another. 

The  low  data  transfer  that  is  necessary  between  tasks  allows  for  a bus  structure  that  is  slow 
thereby  allowing  the  use  of  mechanism  that  achieve  fault  isolation  between  units. 

The  low  cost  of  modern  electronics  allows  the  policy  of  reconfiguring  on  the  basis  of  complete 
computing  modules  or  busses.  Many  previous  fault-tolerant  computer  designs  carried  out  such 
reconfiguration  on  the  basis  of  much  smaller  units.  Such  designs  tend  to  be  very  complex  and  are  cur- 
rently unjustified  in  view  of  the  low  costs  of  modern  electronics. 

In  addition  to  the  fact  that  the  design  is  driven  by  the  requirements  and  technological  advances, 
there  are  many  other  advantages  that  stem  from  the  use  of  software  as  the  principal  techniques  for 
achieving  fault  tolerance.  These  include: 

• The  degree  of  fault  tolerance  can  be  different  for  different  tasks  within  the  task  set. 

• The  degree  of  fault  tolerance  can  be  different  for  the  same  task  at  different  times,  e.g., 
during  different  flight  phases. 

• The  total  computational  power  available  to  the  tasks  can  be  varied  by  changing  the  number  of 
computing  modules. 

• There  are  no  special  fault-tolerance  restrictions  on  the  processing  modules  and  therefore 
standard  off-the-shelf  units  can  be  used  with  the  benefit  that  they  will  have  experienced  far 
more  thorough  validation  and  verification  than  specially  designed  units. 

The  SIFT  concept  embodies  a mamber  of  iieas  whose  usefulness  extends  beyond  the  particular 
application  for  which  the  system  was  intended.  Because  conventional,  off-the-shelf  processing  units 
comprise  the  bulk  of  the  hardware,  the  core  system  can  be  easily  and  inexpensively  adapted  to  a broad 
range  of  needs.  Howf>ver,  because  the  degree  of  reliability  achieved  by  the  system  depends  on  the  number 
of  processors  used  and  on  scheduling  strategies  rather  than  on  built-in  aspects  of  the  design,  it  can 
be  varied  according  to  the  performance  and  cost  requirements  of  the  application. 

The  use  of  a formal  design  medium  for  purposes  of  specification,  validation,  and  reliability 
modeling  can  bo  expected  to  play  an  important  role  in  future  designs  of  fault-tolerant  computers.  VThile 
a system  might  make  extensive  use  of  reduniancy,  unless  the  software  or  hardware  mechanisms  that  manage 
the  redundancy  are  correct,  the  system  will  still  be  unreliable.  Similarly,  the  formulation  and  use  of 
elaborate  reliability  rao<iels  is  to  little  avail  if  it  car.  not  be  assured  that  these  models  actually 
reflect  the  behavior  of  the  system.  We  believe  that  SIFT  constitutes  a major  step  in  the  direction  of 
fault-tolerant  systems  whose  correctness  and  reliability  can  be  verified. 
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A PARALLEL-HYBRID  REDUNDANT  MULTIPROCESSOR 


The  physical  organization  of  the  parallel -hybrid  redundant  multiprocessor  is  substantially  more 
complex  than  a nominal  multiprocessor  organization.  A simplified  module  diagram  of  the  computer  is  shown 
in  Figure  8.  Superficially,  this  diagram  appears  the  same  as  a nominal  multiprocessor.  The  principal 
differences  are  that  the  busses  for  memory  and  interface  access  are  redundant  and  that  the  actual  number 
of  modules  is  three  times  the  number  of  nominal  modules  plus  some  number  of  spares. 

All  activity  is  conducted  by  triads  of  modules  and  triads  of  busses.  A module  triad  is  formed  by 
associating  any  three  like  modules  with  one  another.  This  means  that  any  module  can  serve  as  a spare  for 
any  triad.  Such  flexibility  permits  the  best  possible  utilization  of  surviving  modules.  A single  triad 
of  bus  lines  is  active  at  any  one  time  for  each  of  the  memory  and  interface  accesses.  In  other  words,  a 
three-member  subset  of  N bus  lines  is  chosen  on  a quasi-static  basis  to  serve  as  a bus  triad. 

Every  module  of  every  kind  is  able  to  receive  data  from  all  incident  bus  lines  and  contains  a 
decision  element  to  formulate  a corrected  version  of  bus  data.  It  is  necessary  for  each  module  to  know 
which  three  bus  lines  are  the  active  ones.  These  three  lines  are  connected  to  a voter  in  each  module, 
thus  constituting  a Triple  Modular  Redundant  (TMR)  element.  The  tiiree  active  bus  lines  carry  three 
independently-generated  versions  of  the  data  with  eacli  version  coming  from  a different  member  of  the 
triad  that  is  transmitting  the  data.  To  accomplish  this,  it  is  necessary  to  assign  each  module  to 
transmit  on  one  specific  bus  line.  Now  if  totally  flexible  module  configuration  is  to  be  possible,  it 
follows  that  the  assignment  of  a module's  transmission  to  a single  bus  line  must  be  quasi-static  and 
reconf igurable . 

In  addition  to  the  redundancy  described  in  the  preceding  few  paragraphs,  the  redundant  organization 
differs  from  the  nominal  one  by  virtue  of  the  inclusion  of  independent  submodules  called  bus  guardians 
(BG)  units  in  each  processor,  memory,  and  input-output  access  unit.  Guardians  are  charged  with  governing 
the  status  of  their  associated  modules.  This  includes  power-on  status,  memory'  bus  triad  and  transmission 
selection,  and  certain  self-test  configuration  selections. 

Each  of  the  f’inctions  of  the  guardian  has  the  characteristic  that  its  failure  modes  have  safe 
directions  as  well  as  unsafe  ones.  3y  biasing  the  failure  modes  toward  th**  safe  directions,  it  is 
possible  to  increase  the  probability  of  system  survival.  In  general,  the  safe  failure  modes  of  a 
module  are  power-off  and  bus  transmission  aisconnected . To  bias  in  this  direction,  one  can  employ 
redundant  guardians  in  each  module,  and  require  agreement  among  them  to  establish  pover-on  and  bus 
transmission  enable. 

The  connection  of  bus  guardians  is  illustrated  in  Figure  9*  It  should  first  be  noted  that  the 
guardian  principle  depends  heavily  on  fault  independence.  Therefore  each  guardian  derives  its  power,  its 
bus  inputs,  and  its  timing  reference  independently  of  all  other  guardians.  It  is  moreover  phi'sically 
isolated  from  all  other  guardians  and  all  modules.  A particularly  critical  area  from  the  isolation 
viewpoint  is  the  control  of  the  module's  transmission  interface  onto  the  various  bus  lines.  The  bus 
isolation  gates  (BIGS)  must  be  highly  independent  of  ore  another  as  must  the  guardian’s  enable  signals 
to  these  gates.  This  is  one  of  the  crucial  electrical  and  mechanical  design  aspects  of  the  entire  computer. 

Bus  guardians  are  addressable  as  part  of  the  common  memory  address  space  and  are  capable  of 
receivir.g  messages  from  any  processor  triad  via  the  active  memory  bus  triad.  A message  to  a guardian 

contains  commands  which  are  staticized  by  the  guardian  and  applied  to  its  outputs  until  superseded  by 

a new  command  message.  In  this  way,  the  probability  is  remote  that  a failed  module  can  assert  more 

than  one  erroneous  data  stream.  As  a result,  correct  data  can  be  determined  by  the  bus  voters,  and  the 

malfunctioning  module  can  be  switched  to  a silent  state.  It  is  noted  in  passing  that  certain  failures 
of  a bus  isolation  gate  can  render  a bus  line  useless,  in  which  case  that  active  bus  triad  must  be 
reconfigured  to  use  a spare  line.  However,  most  guardian  failures  appear  as  passive  failures  of  the 
processor,  memory,  or  input-output  access  unit  to  which  the  particular  guardian  unit  pertains. 

Guardians  are  used  as  agents  to  convey  the  computer's  configuration  authority  to  all  elements  of  the 
computer.  They  are  highly  secure  against  the  random  or  willful  malfunction  of  any  single  active 
transmitting  module.  They  make  possible  highly  flexible  reconfiguration. 

All  modules  and  buses  are  organized  into  triads.  In  the  case  of  processors  and  memories,  there  can 
be  numerous  triads  in  existence  at  the  same  time,  but  only  one  memory  bus  triad  and  only  one  interface 
bus  triad.  Each  processor  triad  acts  as  one  functional  processor,  of  which  several  can  work  in  parallel. 
Each  memory  triad  acts  as  a page  of  memory,  of  which  several  can  exist  at  one  time,  but  only  one  can 
communicate  at  a time  with  a processor  triad. 

v^en  a processor  fails,  its  triad  will  attempt  to  complete  its  current  Job  step  which  it  will  be 
able  to  do  unless  a second  failure  prevents  it.  The  period  of  v^alnerability  to  a second  failure  will 
be  a fraction  of  a second.  V»hen  the  Job  step  is  complete,  one  of  the  processor  triads  is  assigned  the 
task  of  reconfiguring  the  injured  trial.  When  the  erroneous  module  is  identified,  it  is  removed  by 
commands  to  its  guardians.  If  a spare  is  available,  it  is  connected  te  the  a]>pr:uriate  bus  by  its 
guardians,  likewise  upon  command  by  the  processor  trial  assigned  to  the  reconfiguration . Triad  identity 
will  bo  assigned  to  the  spare  processor  by  a direct  message.  If  no  spares  are  available,  the  injured 
triad  is  retired.  The  resources  of  the  multiprocess--r  are  diminished  by  one  processing  unit,  and  the 
two  unfalled  members  of  the  former  trial  are  now  available  to  be  used  as  spares  if  further  failures 
occur. 


The  situation  is  much  the  same  for  memory  m<'dulcs.  The  principal  iif Terence  is  that  memories  are 
not  anonymous.  In  fact,  a read-only  memory  module  is  totally  dedicatcu  to  its  assigned  function  and 
cannot  be  used  as  a spare.  When  a read-only  memory  trial  is  injured  by  the  loss  of  a memory  module,  a 
read-write  memory  module  can  be  used  as  a spare.  It  must  be  ;oaded  to  a/*r-'e  with  the  surviving  triad 
members  before  a sectnd  failure  occurs.  If  no  spare  is  available,  the  triad  is  reduced  to  a iyad  vnich 
is  vulnerable  to  the  next  failure,  at  wliich  time  one  m'^m  ry  page  is  lost.  V;.is  is  a significant 


departure  from  the  flexibility  offered  by  the  anonymous  processor  triads.  The  eventuality  of  read-only 
memory  failure  must  clearly  be  covered  by  the  inclusion  of  adequate  spares  either  read-write  memories 
for  flexible  pooled  use  or  extra  dedicated  copies  of  read-only  memory. 
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Figure  8 indicates  the  existence  of  input-output  access  modules  connected  to  the  internal  interface 
bus  and  also  to  the  external  environment.  It  should  be  pointed  out  that  the  external  interfaces  of 
the  computer  could  alternatively  support  dedicated,  bussed,  or  networked  link  structures  to  the  sensor 
and  effector  subsystems.  The  redundancy  structure  at  this  point  depends  on  the  redundancy  desired  in 
the  external  interface. 

The  simplest  conceptual  structure  is  for  a triple-redundant  interface,  such  as  a redundant  external 
bus,  where  the  triple  module  redundancy  structure  is  extended  through  to  the  subsystem  interfaces. 

Each  external  bus  line  can  be  dedicated  to  a different  input-output  access  module,  which  in  turn  is 
assigned  by  its  guardian  xmits  to  transmit  on  one  of  the  active  interface  bus  lines.  More  complex 
variants  are  possible  in  which  each  access  module  performs  error  correction  by  voting  on  incoming  data 
from  the  external  bus. 

Vfhen  an  external  interface  is  non-redundant , the  strategy  would  be  to  assign  it  to  a single  access 
module  where  the  module  would  transmit  on  all  three  active  interface  bus  lines.  A malfunctioning  access 
module  could  pollute  the  entire  interface  bus,  but  with  suitable  encoding  and  protocol  there  would  be  no 
serious  consequences  to  the  state  of  the  system.  The  offending  access  module  could  be  discovered  and 
disconnected  by  bus  guardian  commands  conducted  over  the  memory  bus,  the  major  penalty  being  a time  loss 
on  the  remainder  of  the  input-output  interface  of  the  computer.  For  dedicated  links,  the  loss  of  the 
link  is  non-critical  by  hypothesis.  For  a network,  whose  survival  is  assumed  critical,  the  computer 
must  interface  with  the  network  in  several  places  via  several  distinct  access  modules.  Each  such  inter- 
face would  be  simplex,  but  the  system  would  survive  the  failure  of  all  but  one  of  them. 

The  employment  of  independent  redundancy  requires  some  form  of  synchronization  among  the  independent 
data  sources.  Soft,  or  loose  synchronization  involves  such  operations  as  buffering,  comparing  or  voting, 
signalling  consensus,  and  marking  completed  intervals.  These  can  all  be  done  by  program  when  given 
suitable  intermodule  data  links.  Hard  or  tight  synchronization  involves  hardware  comparison  or  voting 
and  a common  time  reference  where  loose  synchronization  can  employ  separate  time  references. 

Tight  synchronization  is  employed  in  the  parallel  hybrid  redundant  multiprocessor.  It  provides  the 
basis  for  solving  some  problems  and  presents  some  problems  of  its  own.  A common  time  reference,  or  clock, 
that  supports  hardware  voting  allows  instantaneous  validation  of  internal  data,  configuration  control, 
and,  in  some  cases,  interface  data.  In  this  way,  it  helps  to  make  the  redundant  multiprocessor  resemble 
the  nominal  one  which  is  advantageous  to  programmers  at  all  levels. 

The  fault-tolerant  clock  [lO]  shown  in  Figure  10  consists  of  a set  of  independent  phase-locked 
oscillators  arranged  so  that  the  failure  of  one  or  more  of  the  oscillators  (up  to  a design  limit)  does 
not  destroy  the  phase  lock  of  the  survivors.  The  clock  signal  from  each  oscillator  is  distributed  to 
every  module  and  guardian  so  that  each  can  make  an  independent  determination  of  cJocking  edges.  These 
independent  determinations  are  made  by  circuits  called  clock  receivers.  In  normal,  nonfailed  operation, 
the  outputs  of  all  the  clock  receivers  are  in  phase  lock  with  each  other  and  with  all  the  oscillators. 

The  same  phase  lock  holds  when  an  oscillator  fails.  The  failure  of  a clock  distribution  line  appears 
as  an  oscillator  failure,  and  the  failure  '■*'  a clock  receiver  appears  as  a failure  of  the  module  or 
guardian  that  contains  it. 

Fault  Detection.  Identification  and  Recovery 

The  central  computer  is  designed  to  have  a highly  improbable  loss  of  capability.  One  can  roughly 
quantify  this  statement  by  saying  that  one  of  these  computers  should  exhibit  a total  failure  rate  of 
less  than  10“^  in  a flight  of  up  to  ten  hours.  This  virtually  rules  out  the  use  of  ordinary  triple 
modular  redundancy,  as  the  MTBF's  achievabel  in  large  scale  production  have  been  consistently  too  low 
for  such  reliability  without  replacement  of  failed  modules.  Therefore  some  form  of  hybrid  redundancy 
is  needed.  In  a simplistic  view,  hybrid  redundancy  works  by  substituting  a spare  the  first  time  the  TMR 
voters  disagree.  This  view  has  thsf  shortcoming  of  not  taking  latency  of  faults  into  account.  That  is, 
the  first  fault  may  not  result  in  any  voter  disagreements;  whereas  when  combined  with  a second  fault, 
it  may  frustrate  recovery.  A pre-requisite  for  achieving  highly  improbable  failure  in  a hybrid  system 
is  to  expose  latent  faults  by  systematic  exercising,  or  ’'flexing"  of  all  logic  elements.  The  question 
remains  of  how  often  such  flexing  must  occur.  Hopkins  and  Smith  [21]  have  shown  that  the  flexing 
period  must  be  of  the  order  of  seconds  for  a reasonably  sized  system  with  module  MTBF's  in  the  ten- 
thousand  hour  range.  Clearly,  then,  flexing  cannot  be  relegated  to  pre-flight  checkout,  but  must  rather 
be.  conducted  routinely  in  flight.  An  ordinary  hybrid  TMR  system  cannot  routinely  test  itself  when 
performing  critical  functions  as  it  is  vulnerable  during  these  times.  A parallel  hybrid  TMR  system 
can  do  this  becoming  an  integral  part  of  the  computer’s  architecture. 

The  latency  problem  poses  an  interesting  design  dilemma.  Redundancy  is  employed  to  mask  the  effects 
of  faults  upon  the  system  as  a whole.  But  redundancy  requires  flexing  of  all  logic  and  requires  that 
all  possible  faults  that  are  created  by  flexing  be  made  visible  to  the  system  and  not  masked.  The 
resolution  of  this  dilemma  requires  reconfiguration  of  all  independent  system  elements  plus  the 
selective  generation  of  faulty  symptoms  to  verify  detection  mechanisms.  This  is  wny,  for  example, 
quadded  or  interwoven  [22]  logic  is  not  proposed  for  this  application  as  it  cannot  be  reconfigured  and 
tested  on  line.  The  same  holds  for  many  of  the  error-correcting  coded  memory  and  arithercatic  units 
that  have  been  designed. 

In  the  parallel-hybrid  redundant  multiprocessor,  an  error  correction  mechanism  exists  in  every 
module  in  the  form  of  a voter.  Each  voter  must  be  tested  routinely  to  ensure  that  its  error  correcting 
capability  Is  undiminished.  Of  all  the  voters,  only  those  in  the  processor  modules  have  the  additional 
capacity  to  detect  as  well  as  to  correct  errors.  This  is  not  to  save  equipment.  Rather,  the  processor 
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is  the  only  kind  of  module  in  the  computer  that  can  utilize  the  information.  Processor  bus  voters 
under  normal  conditions  will  correct  single  bus  errors  and  will  set  error  latches  to  indicate  which 
of  the  buses  was  in  disagreement.  At  this  time,  the  processor  can  record  the  identity  of  the  nominal 
user  of  the  bus  for  diagnostic  purposes.  A processor  triad  can  flex  its  own  voters  during  a test  Job 
step  by  having  each  triad  member  purposely  utter  independent  bus  data  that  causes  all  possible  kinds 
of  bus  errors.  To  pass  the  test,  all  triad  members  must  receive  the  same  data,  form  the  same 
corrected  result,  and  indicate  the  same  disagreement  patterns  in  their  error  latches.  This  is  a 
relatively  simple  test  procedure  which  can  be  conducted  by  a processor  triad  under  test  while  other 
triads  carry  on  normal  functions.  In  a sense  it  qualifies  the  triad  to  conduct  further  testing  in  which 
the  triad’s  voters  are  the  decision  elements.  The  remainder  of  the  system  testing  function  is  carried 
out  under  the  assumption  that  the  processor  voters  and  error  latches  are  operational.  The  test 
process  involves  the  conversion  of  every  fault  into  an  error  by  making  calculations  whose  results  are 
sensitive  to  each  logic  variable.  Each  bus  and  module,  including  voters,  guardians,  isolation  gates, 
clock  receiver,  oscillators,  and  data  and  power  interfaces  must  be  exercised  in  depth. 

Processor  testing  involves  fairly  conventional  self-test  approaches  except  that  coverage  needs 
to  be  higher  than  that  which  is  typically  obtained  in  computers.  A guideline,  then,  for  processor 
design  is  to  eliminate  obscure  and  pattern-sensitive  sequences  as  much  as  possible.  The  cache  memory  is 
also  tested  by  a conventional  program  approach.  Address  faults  and  pattern  sensitivities  present 
the  most  important  problems  to  be  solved. 

Memory  module  testing  is  similar  to  cache  memory  testing.  The  monory  voters  are  tested  by  sending 
single-error  messages  to  a memory  triad  over  the  memory  bus  and  verifying  correct  responses  from  the 
triad  members. 

Input-output  access  modules  are  also  tested  by  messages  from  processors.  Where  voters  are  used, 
they  are  tested  in  a manner  similar  to  memory  voters.  Simplex  access  units  are  tested  in  conjunction 
with  input-output  links. 

Guardians  are  tested  by  reconfiguration  commands  and  their  voters  are  tested  the  same  as  memory 
voters  by  erroneous  commands. 


The  clocking  system  presents  a unique  testing  problem  because  it  is  nearly  separate  from  the  data 
handling  elements  and  because  the  testing  of  clocking  circuitry  is  fundamentally  different  from  the 
testing  of  other  logic  circuits  which  are  testable  once  a valid  clock  exists.  Latencies  in  the 
oscillators  can  occur  in  the  phase-locking  circuitry.  This  is  probably  the  most  vulnerable  area  for 
in-flight  testing.  If  the  a priori  probability  of  failure  in  phase  lock  circuits  is  sufficiently  low, 
it  may  be  possible  to  perform  these  tests  only  a preflight  time.  Clock  receiver  latencies,  however, 
can  be  tested  in  one  module  at  a time  with  minimal  system  vulnerability. 

We  might  summarize  the  fault  detection  process  as  the  arrival  of  disagreement  errors  at  the  voters 
of  a processor  triad  which  has  been  stimulated  by  normal  or  test  activity.  The  detection  of  a fault 
initiates  the  process  of  fault  identification  which  is  the  discovery  of  the  module,  bus,  or  other 
isolated  element  in  which  the  failure  resides.  During  the  testing  process  for  latent  faults,  there  is 
relatively  little  ambiguity  in  the  determination  of  faulty  modules.  In  normal  operation,  however,  an 
error  on  the  bus  can  come  from  a number  of  sources.  The  identification  of  the  faulty  module  generally 
requires  the  ’’rounding  up  of  suspects,"  that  is,  the  listing  of  elements  that  transmit  on  the  dis- 
agreeing bus.  If  a module  fault  is  pennanent,  the  module  can  be  found  by  moving  it  to  another  bus. 

If  the  bus  is  faulty,  reconfiguration  will  not  move  the  error  to  another  bus. 

Intermittent  faults  are  less  easy  to  identify.  When  the  source  of  an  error  eludes  detection  by 
disappearing,  all  of  the  suspect  elements  are  assigned  one  demerit,  and  c reconfiguration  is  then  made 
to  distribute  the  suspects  evenly  on  different  buses.  Subsequent  error  occurrences  and  reconfigurations 
will  cause  a preponderance  of  demerits  to  accumulate  in  the  name  of  the  faulty  module. 

The  recovery  process  is  one  of  assignment  and  initialization  for  modules,  and  voter  and  transmitter 
selection  for  buses.  These  are  all  accomplished  by  the  bus  guardian  units  upon  receipt  of  commands  from 
active  triads  executing  system  software.  Recovery  can  take  place  even  if  single  errors  are  present 
on  the  buses.  In  principle,  therefore,  an  injured  processor  triad  can  reconfigure  itself. 

Tolerance  Renewal 


The  primary  advantage  of  hybrid  redundancy  over  TMR  is  that  injured  triads  are  reconfigured  back  to 
a state  where  they  can  once  again  mask  malfunctions.  This  is  a process  of  tolerance  renewal.  In 
principle,  the  systaa  failure  rate  is  restored  to  its  design  value  by  the  reconfiguration  process.  If 
reconfiguration  were  to  fail,  the  system  failure  rate  would  increase  possibly  by  many  orders  of 
magnitude. 

In  practice,  there  are  several  ways  in  which  an  injured  triad  can  fail  to  be  reconfigured.  These 
include  exhaustion  of  spare  modules,  malfunction  of  the  reconfiguration  mechanism,  failure  to  detect 
the  need  to  reconfigure,  and  perhaps  the  use  of  a defective  spare  module.  We  can  characterize  the 
process  of  tolerance  renewal  as  the  detection  and  location  of  any  physical  malfunction,  the  removal  of 
vulnerability  from  the  triad  containing  the  malfunction,  the  replacement,  by  spares,  of  functions  thus 
removed,  and  the  initialization  of  the  reconstituted  triad.  All  mechanisms  involved  in  this  process 
are  subject  to  malfunction,  of  course,  and  such  malfunctions  constitute  injury  to  their  triaus  aiiu 
require  that  tolerance  renewal  be  carried  out. 

The  Idealized  renewal  of  tolerance,  together  with  a sufficient  complement  of  spares,  has  ru. 
interesting  theoretical  con.*»equence  if  the  hazard  rate  is  cor.stant.  Figure  11  illustrates  this  concept. 
The  system  failure  probability  increases  monotonicaily  with  time  in  the  absence  of  tolerance  renewal. 

At  any  arbitrary  point  in  time,  if  the  system  is  tested  and  found  to  be  perfect,  the  reliability  becomes 
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equal  to  one.  If  the  test  reveals  an  injury,  then  the  system  is  reconfigured  to  a state  of  virtual 
perfection  and  the  reliability  is  likewise  equal  to  one.  This  paradoxical  result  depends  on  the 
absence  of  any  deterioration  of  the  renewal  mechanism,  however.  The  concept's  utility  is  to  suggest 
to  the  reader  that  the  dynamics  of  redundancy  management  are  all-important  in  maintaining  the  system 
reliability  at  a level  that  is  unreachable  by  non-redundant  elements. 

The  actual  system  behavior  differs  in  several  respects  from  this  concept.  First,  the  supply  of 
spares  is  not  inexhaustable.  Second  the  tolerance  renewal  mechanism  is  subject  to  degradation. 

Finally,  the  ability  to  test  the  system  is  limited  by  its  probabilistic  nature.  These  three  items 
will  be  briefly  discussed  in  the  paragraphs  following. 

The  supply  of  spares  is  a degree  of  freedom  available  to  the  system  designer.  In  the  parallel- 
hybrid  redundant  multiprocessor,  there  can  be  arbitrary  numbers  of  processors,  m^ories,  interface 
access  modules,  memory  bus  lines,  interface  bus  lines,  oscillators,  and  power  converters.  If  the 
failure  rates  of  these  elements  are  known  and  if  the  minimum  numbers  of  each  needed  for  system  survival 
are  known,  the  probability  of  exhaustion  of  spares  as  a function  of  time  can  be  calculated  using 
conventional  combinatorial  analysis. 

The  tolerance  renewal  mechanism  in  the  parallel -hybrid  redundant  multi-processor  is  largely 
contained  in  the  voters  and  the  bus  guardian  units.  Both  the  voters  and  the  guardian  units  possess 
the  bus  line  interfaces,  and  therefore  are  both  capable  of  degrading  elements  (i.e.,  bus  lines)  outside 
of  their  own  modules  (e.g.,  processor,  memory,  interface  access).  This  by  itself  is  not  qualitatively 
different  from  a single  malfunction.  The  important  concern  is  that  all  guardians  in  a single  module 
may  fail  in  such  a way  as  to  enable  that  module  to  transmit  on  more  than  one  bus  line.  As  mentioned 
previously,  design  steps  are  taken  to  minimize  the  probability  of  this  eventuality,  but  the  probability 
is  finite  that  it  will  happen.  A subsequent  failure  of  the  module  in  a malevolent  state  could  cause  an 
entire  central  computer  to  malfunction. 

Finally,  we  deal  with  the  problem  of  detecting  malfunctions  when  they  occur.  If  a malfunction 
results  in  a bus  data  malfunction,  it  can  be  detected  by  the  voters.  If  not,  it  is  termed  "latent." 

Latent  malfunctions  must  be  considered  at  least  as  harmful  as  visible  ones.  It  is  for  this  reason 
that  each  critical  element  must  be  subjected  to  periodic  test  while  the  system  is  on  line.  As  an 
approximation  to  the  testing  process  we  assume  that  the  detection  of  malfunctions  is  an  exponentially 
distributed  random  process  with  a constant  rate  of  discovery.  Figure  12  illustrates  the  nature  of  the 
distribution.  According  to  this  model,  of  all  the  possible  malfunctions  that  will  occur,  most  of  them 
will  be  detected  after  a lapse  of  several  time  constants,  but  some  will  never  be  detected.  The 
parameter  p represents  the  rate  of  discovery.  The  system  malfunction  probability  that  results  from 
finite  malfunction  detection  time  can  be  modeled  as  a Markov  process  with  constant  hazard  rates  and 
constant  discovery  rates. 

The  total  probability  of  system  failure  is  the  composite  of  the  parts  previously  discussed  and  is 
given  in  Figure  13  which  illustrates  the  ability  of  the  multiprocessor  to  meet  the  reliability 
requirement . 

Reliability  and  Maintenance 

The  multiprocessor's  design  approach  to  system  reliability  consists  of  a combination  of  shielding, 
environmental  control,  redundancy,  reconfiguration,  test  algorithms,  voting,  and  high  reliability  design 
and  manufacture  of  all  hardware  elements.  In  addition,  the  system  software  must  be  virtually  perfect. 

To  a certain  extent,  these  facets  of  system  reliability  are  synergistic.  That  is,  once  the  central 
computer  attains  a certain  degree  of  reliability,  it  becomes  competent  to  serve  as  its  own  manager 
and  thereby  becomes  capable  of  attaining  even  greater  survivability  by  judicious  utilization  of 
resources. 

Prior  to  each  flight,  the  multiprocessor  will  test  itself  and  the  rest  of  the  information  system  to 
purge  it  of  latent  malfunctions  and  establish  accurately  the  degree  of  tolerance  remaining.  If  this  is 
adequate  for  dispatch,  the  flight  will  proceed  maintaining  high  reliability  by  the  tolerance  renewal 
procedure  including  frequent  testing  of  every  element.  In  this  way,  a log  is  maintained  of  the  status 
of  every  element  of  the  system.  Intermittent,  transient,  and  permanent  malfunctions  can  be  distin- 
guished, and  the  momentary  tolerance  made  known  to  the  flight  crew.  If  changes  in  flight  plan  or 
envelope  are  called  for,  these  can  likewise  be  made  known. 

Upon  landing,  the  need  for  maintenance,  if  any,  of  the  information  system  will  be  readily  discernable 
including  in  most  cases  the  identities  of  the  elements  that  have  malfunctioned.  A possible  byproduct 
of  system  fault  tolerance  is  the  realization  of  considerable  operational  cost  saving  by  postponing 
maintenance  until  the  aircraft  arrives  at  a base  where  this  is  most  economically  accomplished.  If  the 
probability  of  needing  a nodule  change  earlier  than  one  hundred  flight  hours  can  be  held  below  one 
per  cent,  it  could  be  well  worth  the  inclusion  of  one  or  two  extra  spares  to  make  this  possible. 

Multiprocessor  Software  Issues 

The  major  system  software  elements  are  the  executive,  test,  diagnostic,  and  system  configuration 
programs,  and  the  macro  interpretation  facilities.  These  elements  are  all  closely  related  to  the 
computer's  architecture  and  are  responsible  for  expediting  job  steps  and  for  tolerance  renewal. 

The  executive  program  embraces  several  sub-functions  which  are  the  time  queue,  the  event  queue, 

Job  dispatch,  and  the  cache  memory  functions  of  invocation  and  retirement.  The  time  queue  and  event 
queue  programs  maintain  data  files  containing  the  identities  of  Job  steps  that  are  scheduled  to  be 
executed.  They  also  respectively  contain  for  each  Job  step  the  time  or  the  enabling  event  that  will 
signal  the  go-ahead  for  emplacement.  They  identify  any  necessetry  restrictions  on  emplacement  such  as 
insisting  on  or  excluding  a particular  triad.  Finally,  a priority  measure  is  maintained  for  each  Job 
step.  In  order  to  execute  sampled-data  control  algorithms,  it  is  necessary  and  sufficient  to  dispatch 
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the  same  Job  step  at  regular  intervals.  The  executive  automatically  handles  the  iterative  scheduling 
required  by  these  algorithms.  The  Job  dispatch  program  is  invoked  by  a processor  triad  as  soon  as 
possible  after  the  triad  has  completed  its  execution  of  a Job  step.  This  program  finds  any  non-excluded 
Job  steps  that  have  become  ready  for  emplacement,  chooses  the  one  whose  priority  is  highest,  £ind 
initiates  an  invocation  of  this  Job  step.  This  triad  then  performs  this  Job  step,  and  the  Job  dispatch 
program  is  released  for  use  by  other  triads.  If  no  Job  step  is  ready,  the  job  dispatch  program  becomes 
reinvoked  after  a short  delay  to  allow  other  triads  to  maintain  normal  resource  access. 

Cache  memory  management  consists  of  the  invocation  and  retirement  of  Job  steps  and  of  the  procedures 
and  sub-procedures  that  constitute  the  Job  step  activity.  Invocation  occurs  for  program  portions  that 
reside  in,  or  are  transferred  to,  the  cache  memory.  Retirement  occurs  when  such  a pari,ion  must  be 
overlaid.  There  is  a strong  analogy  between  cache  management  and  page  swapping  in  a hierarchical  memory. 
The  purpose  is  to  relieve  the  applications  programers  of  most  of  the  mechanics  of  moving  their  programs 
and  data  into  the  cache  memory. 

The  reconfiguration  program  for  the  multiprocessor  controls  module  and  bus  activity  by  sending 
messages  to  the  guardian  units.  It  maintains  records  of  the  status  of  each  element  and  algorithmically 
responds  to  contingency  situations.  Reconfiguration  programs  will  also  exist  for  the  external  parts 
of  the  system  primarily  the  input-output  bus  or  network. 

When  a malfunction  is  detected,  the  diagnostic  program  is  invoked.  This  program  attempts  to  locate 
the  malfunction  source  by  using  any  diagnostic  data  available  from  the  processor  triad  that  detected 
the  malfunction  and  by  reconfiguring  the  computer  so  as  to  cause  the  malfunction  source  to  move.  If 
the  malfunction  does  not  recur,  the  demerit  strategy  described  earlier  is  carried  out.  Similar 
programs  will  exist  for  diagnosing  malfunctions  in  the  parts  of  the  information  system  external  to  the 
computer . 

Macro  interpretation  facilities,  like  test  programs,  are  quite  specific  to  the  logical  design  of 
the  processors.  They  will  exist  as  a combined  machine  language  and  microcode  facility  and  will  serve 
not  only  the  applications  programs  but  also  the  system  programs.  Cache  management  and  other  executive 
functions  will  probably  depend  largely  upon  macro  operations.  The  choice  of  macro  operations  and  the 
relative  emphasis  on  machine  language  and  microcode  will  be  influenced  by  the  ‘■.pplications  program 
requirements. 

Experimental  Multiprocessor 

In  June  1976,  an  experimental  multiprocessor  at  the  Draper  Laboratory,  employing  parallel-hybrid 
redundancy  and  closely  resembling  the  multiprocessor  described  here,  was  used  as  a digital  autopilot  in 
a simulated  Boeing  707  aircraft.  The  autopilot  functions  performed  were  minimal  but  totally  critical 
to  flight.  During  the  simulation  exercises,  malfunctions  were  injected  in  a variety  of  ways  into  the 
multiprocessor,  which  successfully  recovered  in  every  case.  Although  still  far  from  a highly  fault- 
tolereint  computer,  this  equipment  has  demonstrated  the  validity  of  the  basic  concept  described  here  in 
many  hundreds  of  hours  of  operation. 


CONCLUSION 


This  paper  has  described  the  principal  characteristics  of  two  reliable  multiprocessors  for  use  in 
a system  designed  to  be  highly  tolerant  of  malfunctions.  These  computers  play  a fault-tolerant  central 
computer  role  in  a distributed  digital  system  and  at  the  same  time  will  be  responsible  for  maintaining 
the  operational  status  of  the  system  and  itself. 

An  experimental  model  of  one  of  the  computers  has  been  built  and  operated  in  an  aircraft  flight 
simulation  that  exhibits  its  ability  to  survive  isolated  random  malfunctions. 

Designs  are  proceeding  on  several  fronts  for  a computer  of  this  type  that  can  operate  dependably 
in  a real  environment.  When  this  type  computer  is  coupled  with  systematically  redundant  sensors  and 
effectors,  their  local  processors,  and  valid  application  software,  it  will  be  capable  of  performing 
critical  control  functions  in  an  aircraft  with  a suitably  remote  probability  of  failure. 

Finally,  these  designs,  while  intended  to  be  capable  of  satisfying  the  very  stringent  reliability 
requirements  of  a modern  aircraft  control  computer,  are  also  appropriate  to  a wider  class  of  applica- 
tions where  high  reliability  is  required. 
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TABLE  I 


FUNCTIONAL  AREA 


TAKE-OFF 


(l)  Navigation  and  Guidance 
A . CURRENT  AIRCRAFT 


B. 


a.  Area  Navigation  (R-Nav) 

b.  Inertial  Navigation  System 

c.  Autoland  (Autothrottle) 

d.  Radar  Altimeter 

e.  Autopilot 
1980's  AIRCRAFT 

a.  Digital  Autopilot  with  Category  Illb 

b.  Hybrid  Navigation 

c.  li-D  Flight  Path  Control 

d.  Heads-Up  Display 


(2)  Stability  and  Control 
A.  CURRENT  AIRCRAFT 

Yaw  Damper 


B. 


b. 


C. 


a.  Autopilot: 

1980*5  AIRCRAFT 
a.  Cockpit  Displays  - Electronic 
Attitude  Direction  Indicator 
Cockpit  Displays  - Electronic 
Horizontal  Situation  Indicator 
Flight  Envelope  Limiter 
Ride  Improvement  System 
Limited  Scale  SAS  (Relaxed 
Static  Stability) 

Limited  Scale  Gust  and  Maneuver 
Load  Alleviation 
1990's  AIRCRAFT 

a.  Full  Scale  SAS  (Negative  Static 
Stability) 

Full  Scale  Gust  and  Maneuver 
Load  Alleviation 
Full  Scale  Flutter  Mode  Control 


f. 


b. 


(3) 


B. 


Air  Traffic  Control 
A.  CURREHT  AIRCRAFT 
a.  Weather  Radar 

h.  Ground  Proximity  Warning  System 
1980' s AIRCRAFT 

a.  Digital  Weather  Radar  Processing 
1990*3  AIRCRAFT 

a.  Digital  Data  Link 

b.  Airborne  Traffic  Situation  Display 

c.  Collision  Avoidance  System 


C. 


(U)  Aircraft  Systgns  Management 
A.  CURREIJT  AIRCRAFT 


b. 


B. 


Fuel  Control  (Tankage) 

Fuel  Control  (Engines) 
c Engine  Inlet  Control 
c.  Center  of  Gravity  Indicator 
(Pre-flight) 

e.  Weight  Monitor  (Pre-Flight) 

f.  Cabin  Environment 

g.  Air  Data  System 

h.  Master  Caution  System 

i.  Fire  Warning  System 

J.  Power  Generation  and  Distribution 

k.  Automatic  Braking  System 

l.  Aircraft  Integrated  Data  System 
(Maintenance) 

1990' s AIRCRAFT 

a.  Integrated  Data  Management  System 
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Figure  7.  SIFT  Failure  Probability. 
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OBJECTIVES  FOR  THE  DESIGN  OF 
IMPROVED  ACTUATION  SYSTEMS 


B.  H.  EARLEY 

CONTROL  SYSTEMS  DEVELOPMENT  BRANCH 
AIR  FORCE  FLIGHT  DYNAMICS  LABORATORY,  WPAFB 


ABSTRACT 


Actuation  system  reliability  Is  particularly  critical  to  the  success  of  modern 
Flight  Control  System  (FCS)  techniques  which  utilize  high  authority  ele  trlcal  contrls. 

The  reliability  combined  with  typical  performance  requirements  have  a substantial  effect 
on  the  actuator  resulting  In  a high  level  of  sophistication,  complexity,  attendant  high 
cost  and  reduced  maintainability.  As  background,  a description  of  actuation  systems 
evolution  and  several  redundant  system  approaches  are  Included.  Significant  redundant 
system  criteria  are  discussed  followed  by  a series  of  design  objectives  pertinent  tc  relief 
of  typical  actuator  problems.  A basic  design  approach  is  also  recomr.ended  to  assure  tim.eiv 
integration  of  the  FCS  In  the  vehicle  design  process  and  proper  Interfacing  of  t.he  primary 
technical  disciplines.  The  Implem.entatlon  and  benefits  of  actuation  system  developr.ent 
programs  are  discussed  and  several  new  actuation  concepts  are  mentioned. 


INTRODUCTION 


With  the  application  of  high  authority,  closed-loop  electrical  control  techniques 
to  primary  Flight  Control  Systems  (FCS),  the  impact  of  reliability  requirements  upon 
actuation  systems  is  greater  than  ever.  Control  Augmentation  Systems  (CAS)  and  Fly-Bv-Wlre 
(FBW)  In  particular  utilize  these  techniques  which  demand  redundant  control  loops  and 
hence  redundant  actuation  channels.  Typical  mechanical  primary  control  systems  are  rela- 
tively straightforward  and  adequate  reliability  can  be  obtained  through  conservative  design 
practices,  dual  load  paths,  etc.  On  the  other  hand,  the  sophistication  of  electrical 
control  concepts  and  redundant  actuators  presents  a real  challenge  to  be  overcome  In  the 
process  of  assuring  effective  reliability.  As  with  the  mechanical  approach,  reliability  of 
the  electrical  primary  control  system  Is  essential  to  both  flight  safety  and  mission 
effect Iveness . 

The  Impact  of  redundancy  on  secondary  actuation  has  been  particularly  significant. 

For  each  electrical  input  signal  path,  a single  stage  electrohydraullc  transducer  provides 
a hydraulic  signal  to  one  or  more  hydraulic  amplification  stages  producing  a final  r.echanl- 
cal  output  which  drives  the  main  control  valve  of  the  surface  power  actuator  (In  con.lunc- 
tlon  with  a mechanical  input,  if  It  exists,  as  In  the  CAS  approach).  Proper  operation  of 
all  secondary  actuator  stages  Is  dependent  upon  precision  electromechanical  or  hydr  - 
mechanical  components  which  are  expensive.  Multiplication  of  each  stage  by  three  or  four, 
the  number  of  channels  usually  required  at  least  where  there  is  no  mechanical  Input  as  a 
backup,  results  in  an  Impressive  amount  of  complexity  and  cost.  Complexity  and  precision 
components  similarly  Increase  supply  logistics  costs  and  can  decrease  maintainability. 

Stringent  FCS  performance  requirements  are  dictated  by  the  need  to  devel-rp  air- 
craft which  are  superior  to  those  available  to  a potential  agresscr.  Modern  FCS  require 
extensive  interfacing  and  each  application  has  been  sufficiently  unique  to  result  In  little 
or  no  commonality.  These  factors,  in  addition  to  reliability  requirements,  have  collec- 
tively been  the  prime  drivers  leading  to  current  actuator  systems  sophistication,  compl'.xlty  j 

and  cost. 

Due  to  recent  FCS  technical  advances  leading  to  the  application  of  CAS  and  FBW 
systems,  the  potential  for  FCS  performance  growth  Is  unusually  high.  Simultaneously,  the 
full  realization  of  all  benefits  will  require  effective  redundancy  managerient , cost 
controls  and  Improved  maintainability  - with  little  or  no  compromise  to  the  performance 
potential.  An  awareness  and  appreciation  of  these  factors.  In  addition  to  a campr' tensive 
understanding  of  the  basic  FCS  technology,  will  be  prerequisite  to  the  design  of  a truj.y 
successful  actuation  system. 

This  chapter  focuses  exclusively  on  the  actuation  system  as  does  no  other  section 
of  this  document.  It  Is  significant  to  note  that  with  the  «'<ceptlon  of  flight  control  com- 
puters, the  cost  of  actuator  assemblies  can  exceed  the  cur.ulaMve  cost  of  the  balance  of 
all  other  FCS  components  and  assemblies.  Also,  they  a.’e  nest  vulnerable  to  the  Impact  of 
other  criteria,  such  as  reliability  for  example. 


J 


ACT'JATIOn  EVOLUTION 


1 8-: 


Typical  of  primary  actuation  systems  are  those  that  control  the  pitch,  vaw  and 
roll  axes  via  elevator,  rudder  and  aileron  surfaces  respectively.  Depending'  on  the  air- 
craft, other  surfaces  or  combinations  thereof  may  be  utilized  for  prl.mary  control  functions. 
Spoilers  are  employed  to  decrease  wing  lift  and  to  augment  roll  contr-l,  a stabllator,  or 
all  moving  horizontal  tail  surface,  to  provide  pitch  control,  and  elevons,  a combination 
of  ailerons  and  split  elevator  surfaces  to  provide  pitch  and  some  degree  of  roll  control  as 
well.  Leading  and  trailing  edge  flaps,  trim  develces,  speed  brakes,  etc.  are  not 
generally  considered  to  be  primary  flight  controls. 

Primary  control  surface  power  actuators  are  usually  hydromechanical  servomech- 
anisms controlled  directly  by  the  pilot  or  through  electrically  oriented  systems.  The 
power  source  is  a remote  control  hydraulic  sys’^em.  (or  systems).  They  are  essential  PCS 
elements  which  in  tl.’lr  most  elemen*'ary  form  would  basically  consist  of  a control  valve, 
cylinder  and  piston  combination.  On  the  other  hand,  when  secondary  actuation  functions  are 
Included,  a ship's  set  may  represent  30%  of  the  total  PCS  cost  (exclusive  of  computational 
elements).  The  power  actuators  are  capablle  of  a bi-directional  force  outpu*' , usually 
linear,  or  occasionally  rotary,  which  is  relatively  large  in  contrast  to  the  actuator  size 
and  weight.  Output  displacement  is  proportional  to  input  signal  magnitude,  and  output 
force  developed  must  be  compatible  with  the  opposing  (aerodynamic)  load,  which  is  usually 
in  direct  proportion  to  the  displacement.  A variable  rate  is  also  provided  as  functions 
of  control  valve  opening  and  actuator  load. 

With  a mechanical  Input/feedback  system,  displacement  of  the  pilot's  control  pr  - 
portlonally  displaces  tlie  actuator  control  valve  causing  actuator  or  piston  rod  displace- 
ment. The  actuator  force  required  to  drive  the  surface  is  generated  by  metering  a rela- 
tively high  pressure  volume  of  fluid  through  the  control  valve  to  one  side  of  the  actuator 
piston  and  relieving  the  relatively  low  pressure  volume  on  the  other  side.  Motion  continues 
until  the  rod  is  repositioned  at  the  point  where  the  mechanical  feedback  re-nulls  the  con- 
trol valve  at  neutral.  The  linear  output  of  the  unit  is  translated  into  rotary  surface 
motion  via  a crank  arm.  Most  often  the  control  surface  will  have  a neutral  or  faired 
position,  being  capable  of  an  approximately  equal  rotation  in  either  direction,  e.g.  + 25 
degrees . 

Manual  Operation; 

Prior  to  World  War  II  primary  control  surfaces  were  operated  manually  except  for 
very  special  cases.  Where  high  aerodynamic  forces  were  present,  motion  of  the  pilot's 
control,  through  a mechanical  path  of  cabling,  push-rods,  bellcranks,  etc.,  drove  a surface 
tab  (similar  to  a trim  tab)  in  lieu  of  driving  the  surface  directly.  (See  Figure  1) 

Surface  tab  motion,  the  reverse  of  the  surface  motion  desired,  created  a driving  moment, 
forcing  the  surface  itself  in  the  proper  direction.  Hence,  aerodynamic  forces  acting  on 
the  tab,  which  was  integral  to  the  surface  itself,  created  the  muscle  positioning  the 
control  surface  to  achieve  the  required  aircraft  attitude  or  response. 

A variety  of  other  aerodynamic  balancing  devices  were  also  used  to  limit  the 
forces  which  must  be  overcome  by  the  pilot.  Except  for  these  devices,  built  in  mechanical 
advantage  or  other  assisting  mechanisms  all  forces  applied  to  the  surface  were  derived 
solely  from  the  pilot.  These  systems  were  force  limited,  slow  and  completely  dependent  on 
pilot  input;  however,  they  were  inherently  reliable,  due  in  large  part  to  their  straight- 
forwardness . 

Hydraulic  Boost  Operation: 

The  next  step  in  actuator  development  was  to  incorporate  a hydraulically  powered 
boost  cylinder  between  the  pilot  and  the  control  surface  (normally  immediately  ad.iacent  to 
the  surface)  to  supplement  the  pilot  Initiated  forces  to  the  extent  required  to  drive  the 
surface.  (See  Figure  2)  Hydraulic  power  was  selected  primarily  due  to  its  force  vs  weight 
effectiveness.  The  pilot's  input  was  connected  to  the  surface  crank  arm  but  also  moved  a 
valve  controlling  the  boost  cylinder  piston.  Cylinder  output  force  was  applied  to  the  same 
surface  crank  arm  attached  In  turn  to  the  control  surface,  hence  augmenting  the  pilot's 
input  force.  This  mechanization  provided  the  potential  for  higher  surface  rates,  was  com- 
paratively simple  and  reliable;  however,  it  was  still  fully  dependent  on  pilot  input. 

Fully  Powered  Actuator: 

A more  sopiilstlcated  version  of  the  boost  cylinder  was  the  fully  powered  mechani- 
cal Input/feedback  hydraulic  servoactuator . (See  Figure  3)  As  above,  pilot  input  con- 
trolled valve  position  determining  the  direction  of  flow  and  rate  to  the  hydraulic 
cylinder,  except  that  there  was  no  parallel  mechanical  path  to  the  surface.  Accordingly 
the  system  was  irreversible,  not  permitting  any  load  feedback  from  the  surface  to  the  cock- 
pit, thereby  avoiding  load  non-linearities  (reversals,  etc.)  such  as  occur  in  the  transonic 
flight  region.  This  approach  also  unfortunately  eliminated  inherent  load  sensing  feedback 
necessitating  the  implementation  of  artificial  feel  systems.  Although  not  shown  in  the 
figure,  a mechanical  feedback  linkage  configuration  sensed  actuator  output  displacement  so 
that  when  surface  position  corresponded  to  the  input  command,  the  initial  valve  signal  was 
offset,  returning  the  valve  spool  to  a null  position.  Pilot  control  of  aircraft  response 
was' indirect  via  sensing  of  the  surface  position,  l.e.  if  additional  response  was  desired, 
the  control  displacement  would  be  increased  requiring  additional  actuator  stroke  for  the 
feedback  to  null  out  the  control  valve.  This  actuator  type  provided  full  driving  force  to 


the  surface,  Inherent  irreversibility  and  more  positive  and  accurate  control. 
Multi-Input  Actuator: 


IK-.l 


Next,  during  WW  II,  the  ability  for  the  actuator  to  respond  to  an  electrical 
Input  autopilot  signal  was  added.  An  electrohydraulic  servo  consisting  of  an  Electro 
Hydraulic  Valve  (EHV)  controlling  a hydraulically  powered  secondary  actuator  (or  servo) 
was  located  in  parallel  with  the  mechanical  control  path.  (See  Figure  <4)  The  low  level 
autopilot  input  to  the  EHV  was  converted  to  a hydraulic  signal  driving  the  secondary 
actuator  piston  producing  a proportional  mechanical  signal  of  sufficient  strength  to  mix 
with  the  pilot's  mechanical  Input.  The  arrangement  was  such  that  the  autopilot  electrical 
Input  not  only  moved  the  power  actuator  control  valve,  but  the  entire  linkage  svstem  as 
well.  This  provided  the  pilot  a means  of  monitoring  the  automatic  Inputs  through  mc'^ior.  ; 
the  controls.  Since  autopilot  corrections  required  during  cruise  were  small,  the  secondart 
servo  authority  was  limited  giving  the  pilot  capability  to  override  the  Inputs  and 
avoiding  the  necessity  for  redundancy. 


Stability  Augmentation  Systems  (SAS): 


For  two  basic  reasons  SAS  came  into  being.  First,  due  to  the  differences  in  aer- 
' dynamic  loading  and  response,  it  b<^came  impossible  to  obtain  good  flying  qualities  in  all 

flight  regimes,  particularly  at  both  ends  of  the  speed  range.  Secondly,  inherent  stability 
‘ varied  considerably  as  aircraft  approached  and  exceeded  the  speed  of  sound.  Vehicle 

! designs  exhibited  marginal  stability  under  certain  conditions;  therefore,  another  type  of 

I automatic  system  was  required  to  compensate.  SAS  implementation  is  typical  of  the  ccn- 

I figuration  shown  in  Figure  5 utilizing  a secondary  hydraulic  series  servo  In  the  m.echar.ic.al 

path.  Rate  gyroscopes  sensed  aircraft  oscillations  which  through  the  actuation  sys'er. 

; could  damp  out  the  undesired  motions  by  the  control  surfaces.  Electrical  Inputs  from. 

I the  gyros  drove  the  secondary  servos  in  the  same  manner  as  the  autopilot;  h wever,  ‘he 

L series  signal  interface  was  such  that  the  SAS  input  added  to  or  subtracted  from,  ‘he  pilot 

[ input  and  was  not  reflected  back  to  the  cockpit  controls.  Slm.llarly,  SAS  au'h'ritv  was 

' limited  and  redundancy  was  usually  not  employed.  Inherent  disadvantages  of  this  mechaniza- 

I tion  were  that  it  could  reduce  pilot  authority  and  could  not  differentiate  be-ween  pilot 

; and  disturbance  inputs, 

i 

[ Control  Augmentation: 

I To  avoid  the  problem  of  the  automatic  system  detracting  fr:r.  con*'rci  effectiveness, 

an  electrical  feedforward  path,  sensing  pilot  control  inputs  cculd  be  added  ■.  the  Figure  " 

I configuration  and  summed  with  the  airframe  dynamic  inputs  such  as  the  rate  gyr  and 

I accelerometers.  With  this  addition,  as  shown  in  Figure  6,  the  control  loop  cculi  be 

i closed  via  an  electronic  system  so  that  control  inputs  would  be  compared  to  actual  vehicle 

t response  rather  than  to  a surface  position.  Thus  a third  basic  type  of  mechanical 'elec* rl- 

I cal  input  system,  the  Control  Augmentation  System  (CAS),  was  derived.  kith  'his  mode  'f 

[ control  the  primary  path  was  electrical  in  nature  with  the  mechanical  path  utilized  as  a 

[ backup.  Accordingly,  the  level  of  authority  possible  through  the  secondary  serve  was 

> Increased  up  to  100^,  necessitating  redundant  techniques.  At  this  point  the  technology 

' and  equipment  reliability  available  were  such  that  primary  control  via  multiple  channels 

j was  practical,  however,  the  potential  to  significantly  com.pllcate  the  system,  was  in* r d'.iced . 

; The  Figure  6 diagram  does  not  show  an  independent  (or  remote)  secondary  actuator, 

1 but  is  typical  of  packaging  to  incorporate  both  secondary  and  power  actuator  functions. 

1 Either  parallel  or  series  arrangements  could  be  accommodated  within  a single  asse.m.blv.  On 

4 the  other  hand,  the  actuator  as  a single  Tine  Replaceable  Unit  (LRU),  becam.e  com.pllcated 

i and  expensive,  particularly  so  since  redundant  channels  were  required.  If  *he  secondary 

and  power  actuator  functions  were  separated,  the  vast  m.ajorlty  -f  the  cos*  and  comirlexl**.- 
would  simply  be  shifted  to  the  secondary  unit  since  this  is  where  the  actuator  redundancy 
must  be  concentrated.  Also,  the  extent  of  actuator  interfacing  required  is  m.ore  than 
doubled . 

Electrical  input  capability  has  provided  tremendous  advances  in  flight  control 
technology.  It  automatically  controls  the  aircraft  during  cruise,  compensates  for  subsonic 
and  supersonic  flight  conditions,  offsets  stability  limitations  and  actually  Im.proves 
vehicle  control  by  providing  more  responsive  and  accurate  control  than  would  otherwise  be 
possible.  With  the  advent  of  Fly-By-Wlre  (FBWI  the  near  future  offers  m.uch  m.ore  I"  be 
exploited . 

Fly-By-Wlre: 

The  complete  elimination  of  the  mechanical  control  path  results  in  a FBW  -■  ntr  1 
system  where  all  commands  to  the  actuators  are  transmitted  electrically  u*Illr'.i-.g  high 
authority  closed  loop  techniques.  Vehicle  dynamic  feedback  again  cl.-s»s  the  crntrvl  locp 
so  that  airplane  response  rather  than  surface  position  is  directly  controlled.  A typical 
implementation  is  shown  in  Figure  7.  Today  the  st ate-.- f-t h“-art  has  pr.  gressed  • a p in* 
where  confidence  due  to  experience  with  electrical  input  techniques  and  ‘he  reiiablli*'.*  f 
available  equipment  is  adequate  to  .lustlfy  the  full  FBW  approaci,.  Fse'^d.  FBW  svfer;-. 
where  a normally  disengaged  mechanical  path  is  retained  for  backuf , are  s*lll  ; pular  a:.  1 
the  first  pure  FBW  operational  aircraft  has  yet  to  enter  the  active  !nve;.<  rv.  R<'‘en*',  n 
of  the  mechanical  controls  as  a backup  is  a significant  dl  sa<lvant  age  te.'ause  f l*s  a.ss'  - 
elated  cost,  attendant  complexity,  rigging  requirements  and  the  fact  tha*  a ci  m.p reher.s  1 ve 
declutching  scheme  Is  needed.  Where  It  Is  necessary  ff  r * he  m.^chanlcal  and  electrical  pa'hs 


to  operate  simultaneously,  as  with  the  CAS,  both  must  be  closely  synchronized  to  minimize 
mechanical  anomalies  which  would  adversely  effect  the  function  of  the  electrical  primary 
path.  For  example,  mechanical  hysteresis  could  Introduce  limit  cycling,  or  underdamped 
system  oscillations,  making  precise  operation  of  the  mechanical  elements  mandatory. 

Exclusive  of  the  considerable  potential  for  FEW  derlvltlve  concepts  and  the 
Increased  opportunity  for  major  systems  Integration,  the  basic  FEW  approach  offers  many 
advantages.  These  are  an  improved  level  of  precise  performance.  Improved  reliability 
through  redundant  techniques,  reduced  weight  and  space  requrlements , reduced  design. 
Installation  and  maintenance  and  finally,  reduced  life  cycle  costs.  With  the  elimination 
of  the  mechanical  path,  overall  system  complexity  Is  reduced;  however,  with  the  attendant 
need  for  redundancy,  the  actuator,  partlculary  the  secondary  assembly  remains  quite  complex. 

Current  State-of-the-Art : 

Today's  flight  control  systems  are  sophisticated,  high  performance  systems  cap- 
able of  enhancing  vehicle  performance  as  well  as  providing  basic  control.  FEW  technology 
Is  now  about  to  open  a new  era  in  flight  controls.  It  has  the  potential  not  only  to 
significantly  affect  basic  aircraft  design  and  configuration,  but  also  to  remove  long- 
standing constraints,  for  example  the  need  for  a complete  mechanical  control  path  and 
complex  feel  systems.  Most  Important,  FEW  will  provide  the  catalyst  to  broaden  controls 
application  and  the  functions  which  may  be  accordingly  performed,  substantially  increasing 
the  flexibility  and  usefulness  of  the  hydraulic  servoactuator , but  increasing  the  demands 
It  must  satisfy.  Typical  new  technology  applications  include  allevatlon  of  structural 
loads,  attenuation  of  aerodynamic  flutter,  flight  path  smoothing,  safe  control  with  reduced 
static  stability,  and  maneuver  load  control.  Maneuverability  and  performance  is  enhanced 
by  systems  providing  Direct  Lift  Control  (DLC)  and  Direct  Side  Force  Control  (DSFC).  All 
of  the  associted  benefits  are  facilitated,  if  not  made  possible,  via  FEW  technology. 

Integration  of  the  PCS  with  other  major  aircraft  systems  is  now  much  more 
practical.  As  an  example,  the  engine  controls  may  be  coupled  with  the  flight  controls  to 
provide  Improved  and  automatic  intersystem  coordination  during  certain  flight  modes  such  as 
take-off  and  landing.  Or,  particularly  applicable  to  fighter  aircraft,  the  flight  controls 
may  be  Integrated  with  the  armament  system  to  properly  aim  the  fuselage  to  assure  accuracy 
in  weapons  delivery.  Tracking  systems  can  be  readily  coupled  to  the  FCS  to  maintain  the 
attacking  aircraft  in  firing  position  throughout  maneuvers  Initiated  by  the  target  air- 
craft. It  Is  further  practical  today  to  interface  the  primary  flight  control  system  to  an 
electronic  terrain  avoidance  system  allowing  low  altitude,  high  speed  pentratlon  of 
bombers  or  fighters. 

Finally,  the  effectiveness  and  efficiency  with  which  the  complete  aircraft 
mission  Is  carried  out  can  be  maximized  via  airborne  computer  systems.  These  systems  are 
programmed  to  cause  the  flight  control  system  and  the  aircraft  in  turn  to  perform  the 
desired  functions  in  accordance  with  predetermined  mathematical  approximations  of  the 
corresponding  inputs  required.  The  mathematical  approximations  are  multimode  control  laws 
which  allow  the  flying  qualities  to  be  modified  to  produce  nearly  ideal  dynamic  character- 
istics tailored  to  specific  mission  tasks.  (Ref  1) 

All  of  the  above  are  new  tools  which  the  designer  may  utilize  to  satisfy  alrcraft 
control  requirements  more  effectively.  Many  of  the  types  of  applications  named  are  new 
and  unique  and  most  have  at  least  been  test  flown;  however,  the  technology  is  essentially 
untried  In  operational  fleet  aircraft.  Once  matured,  the  FCS  Improvements  realized  should 
be  dramatic. 

Redundancy  Management  Concepts: 

Although  the  idea  of  redundancy  l.s  not  new  to  f'.lgh"  controls,  the  demands  made 
by  the  Introduction  of  FEW  are  so  substantial  that  the  .subject  of  FCS  redundancy  management 
for  modern  aircraft  is  a new  challenge  of  tremendous  cor.plexl  t v . This  Is  so  r.uch  the  case 
that  several  aerospace  companies  have  undertaken  comprehensive  laboratory  development  pro- 
grams primarily  for  educational  purposes.  These  programs,  frequently  Internally  funded, 
are  appropriately  considered  by  management  as  necessary  to  the  company's  competitive  status 
by  providing  a firm  technical  base  for  future  programs.  Companies  with  actual  FEW  experi- 
ence also  benefit  from  this  approach  by  expanding  their  backgrounds  and  by  obtaining 
maximum  system  data  prior  to  flight  test. 

The  typical  servoactuator  will  Incorporate  several  visible  levels  of  redundancy. 

A popular  power  actuator  conflgurat Ion  Is  the  dual  tandem  type  havlnr  two  Isolated  In-line 
cy 1 1 nder/p 1st  on  chambers  controlled  by  a one  piece  dual  tandem  valve.  Like  the  valve,  'he 
two  pistons  and  the  rod  which  transmits  the  actuator  output  force  will  be  a slnrle 
structural  entity.  The  mechanical  simplicity  and  predictable  failure  modes,  combined  with 
appropriate  design  conservatism,  typically  enable  this  straightforward  approach.  Hence  the 
output  Itself  Is  generally  a single  load  path;  however,  the  valve  and  p wer  actuator  are 
dualized  to  allow  for  continued  opera' Ion  at  50?  output  subsequent  to  failure  of  one  of 
the  two  hydraulic  power  sources. 

Continuing  the  Jesorlp'lon  of  the  actua'  r from  ou'rut  to  Input,  the  major  Ir.pa'* 
icf  redundancy  Is  essentially  limited  t-  the  hy dronechan  1 ca  1 secondary  actuator.  F.ach 
Independent  redunda:it  channel  which  l.s  require!  to  provide  a given  overall  level  of 
reliability  adds  to  the  total  complexity  and  number  of  precision  parts.  It  Is  at  this 
point  that  the  designer's  task  becomes  mc’.st  cor.prehens  1 ve . There  will  gener.allv  be  three 
or  four  separate  and  equivalent  electrical  channel  Inputs  of  relatively  low  signal  strength 


18-5 


to  the  secondary  actuator.  How  these  signals  are  combined  and  utilized  before  and  after 
fallure(s)  Is  dependent  on  the  redundancy  concept  selected.  Frequently  after  having  sus- 
tained two  failures  the  system  must  still  be  operative,  reverting  to  a fall  safe  or  neutral 
condition  on  the  third  failure.  This  Is  a Double  Fall  Operate/Fall  Safe  (DFO/FS)  system. 

Not  all  dual  failure  conditions  may  be  accounted  for,  particularly  If  the  possibility  of 
their  combined  occurance  Is  less  than  remote. 

Where  not  all  dual  failures  would  permit  continued  operation,  strictly  speaking 
the  system  would  be  classified  as  Single  Fall  Operate  (SFO).  Equally  Important  Is  the 
level  of  performance  degradation  which  can  be  tolerated  after  each  failure.  For  example 
with  a DFO/FS  system  It  may  be  practical  to  complete  the  aircraft  mission,  while  with  a 
SFO  system,  return  to  base  would  be  Imperative. 

Although  numerous  derivatives  exist,  there  are  probably  less  than  half  a dozen 
basic  redundancy  schemes.  Several  popular  types  will  be  described  beginning  with  the 
force  sharing  approach  where  the  outputs  of  each  Individual  secondary  Cor  power)  actuator 
channel  are  structurally  combined,  driving  the  load  collectively. 

Force  Sharing: 

An  example  of  force  sharing  system  was  the  philosophy  applied  to  the  American 
Supersonic  Transport  (SST)  where  the  basic  concept  was  applied  to  all  primary  surface  power 
actuators,  to  the  secondary  actuators  or  Electric  Command  (EC)  servos  and  to  Master  Servo 
(MS)  units.  In  the  pitch  axis,  shown  In  the  simplified  schematic  of  Figure  8,  a highly 
reliable  electronic  stability  augmentation  system  was  required  for  safety  of  flight.  Four 
redundant  electronic  channels  were  necessary  to  assure  that  the  loss  of  function  resulting 
from  multiple  failures  would  be  an  extremely  remote  possibility.  The  primary  mode  of  con- 
trol was  electrical  with  a single  load  path  mechanical  system  providing  a backup  following  a 
multiple  failure.  Safety  and  high  electronic  systems  integrity  dictated  the  utilization  of 
two  functionally  different  electronic  systems,  the  .Hardened  Stability  Augmentation  System 
(HSAS)  and  the  Electric  Command  and  Stability  System  (ECSS).  The  HSAS  provided  pitch  axis 
stabilization  to  ensure  minimum  safe  handling  qualities  and  the  ECSS  provided  normal 
actuation  system  control  and  aircraft  handling  qualities. 

Pilot  commands  were  transmitted  mechanically  through  a single  cable  system  to  a 
three  channel  Master  Servo  (MS)  unit  which  functioned  as  a force  isolating  boost  servo  and 
was  connected  mechanically  to  the  control  valves  of  four  Individual  stabilizer  actuators. 

The  cable  system  was  able  to  be  non-redundant  since  this  path  provided  a backup  function 
only.  Accordingly  the  feel  mechanism  was  Implemented  at  the  control  columns  and  a spring 
centering  device  for  the  MS  units,  to  ground  the  mechanical  system  following  cable  failure, 
was  located  at  the  opposite  end  of  the  cable  system.  Multiple  load  paths  were  provided 
: “tween  the  control  columns  and  the  feel  system  and  between  the  MS  units  and  the  power 
actuators . 


Pilot  commands  were  transmitted  electrically  from  four  control  column  mounted 
transducers  through  both  the  HSAS  and  ECSS  systems  to  the  Electric  Command  (EC)  servos 

which  were  integral  with  the  power  actuators.  EC  servo  output  was  summed  with  the  MS 

(mechanical)  output  Immediately  upstream  of  the  actuator  main  valves. 

Since  the  cable  (mechanical)  system  Input  was  non-essential,  total  surface 
command  capability  was  necessary  for  the  electrical  control  mode.  During  normal  operation 
mechanical  Input  was  sensed  by  a negator  transducer  and  fed  back  to  the  system  electronics 
where  It  was  summed  with  the  HSAS  and  ECSS  system  outputs.  Accordingly  the  mechanical  sys- 
tem Input  was  subtracted  from  the  EC  servo  electrical  Input  so  that  the  servo  output  was 

equal  to  the  difference  between  the  desired  command  and  the  mechanical  input. 

To  provide  the  required  level  of  reliability,  both  the  HSAS  and  ECSS  were  four 
channel,  powered  by  Independent  electrical  sources  and  all  hydraulic  elements  were 
supplied  by  separate  hydraulic  systems.  Complete  HSAS  channel  separation  was  required  per- 
mitting no  Interchannel  connections  or  cross  channel  voting.  ECSS  system  gain  scheduling 
and  cross  channel  voting  were  incorporated,  but  the  additional  complication  was  considered 
non-essential  to  the  HSAS  and  likely  to  compromise  Inherent  reliability  and  simplicity. 

The  ECSS  was  the  only  system  non-essential  to  safetv  which  Interfaced  with  the  HSAS; 
however,  ECSS  Implementation  was  such  that  Its  failure  could  not  effect  HSAS  operation. 

Each  EC  servo  output,  in  addition  to  driving  the  valve  linkage  of  the  actuator 
with  which  it  was  integral,  was  also  interfaced  with  the  other  EC  servos  by  an  external 
synchronizing  shaft.  Anti-, lam  detents  prevented  a single  EC  servo  failure  from  affecting 
more  than  one  actuator.  When  such  a failure  occured,  depressurization  of  the  faulty 
actuator  allowed  continued  operation  on  the  remaining  three  ■•'.anneis. 

EC  servo  equalizing  bypass  valves  were  ■.•'.lloed  In  each  servo  to  minimize  Input 
ffsets  and  tolerances  between  the  electrical  channe;  signals  and  t.  avoid  switching 
transients.  Since  the  EC  outputs  were  force  summej,  e ;ai ; ca*  1'  n was  Implemented  tv 
L/lnear  Variable  Differential  'i'runsl’'-rmer  vLVDT'  valve.'^  to  balance  the 

steady  state  Individual  channel  Input.".  An  el- ! -a  1 1 ■•e-rac  r with  a pre-determl ned 
threshold  was  Incorporated  to  avoid  eliminating  t steady  state  f -rce  capability  of  the 
servos.  (Ref  2) 
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In  the  yaw  and  roll  axis,  rather  than  Integrating  the  EC  servos  with  the  power 
actuators,  the  four  EC  channels  were  packaged  together,  with  an  Integral  synchronization 
shaft,  as  a single  Line  Replaceable  Unit  (LRU),  One  EC  servo  package,  mounted  separately 
and  Interfaced  mechanically,  was  provided  for  electrical  control  via  the  ECSS  of  each  group 
of  surface  power  actuators.  Identical  MS  units  were  Interfaced  as  In  the  pitch  system. 

With  the  exception  of  the  pitch  EC  servos,  all  other  units  were  Identical,  and 
as  In  the  pitch  system  had  DFO/FS  capability.  Each  EC  servo  piston  was  monitored  by  an 
LVDT  which  following  a failure  provided  an  output  feedback  causing  the  remaining  good 
channels  to  oppose  the  failed  channel  and  minimize  failure  effects.  Once  failed, a channel 
would  be  depressurized,  the  piston  bypassed  and  the  synchronization  shaft  detent  released. 

Degraded,  but  adequate  operation  was  practical  following  failure  In  any  two  channels.  As 
each  failed  EC  channel  shaft  detent  was  released,  a ground  bias  load  was  Imposed  on  the 
remaining  channels.  Upon  failure  of  a third  EC  channel,  the  remaining  channels  would  be 
shut  down  and  the  collective  bias  load  provided  an  effective  ground  point  for  control 
through  the  mechanical  path.  Hence  the  EC  servos  failed  safe  and  mechanical  reversion  con- 
trol (without  HSAS)  was  practical  for  all  exc -pt  aft  c.g.  locations. 

The  MS  units  each  utilized  three  hydromechanical  channels  (with  three  hydraulic 
sources)  all  receiving  the  same  mechanical  Input  through  a cable  system  from  the  pilot 
controls.  With  the  external  spring  centering  devlc  , which  caused  the  three  MS  to  return 
to  a neutral  position  following  a cable  failure,  and  since  there  were  no  provisions  for 
selective  channel  shutdown,  the  redundancy  level  was  SFO/FS.  The  MS  units  were  strictly 
hydromechanical;  therefore  any  servo  channel  differences  were  force  averaged  at  th^  output 
summing  point. 

All  primary  control  surfaces  were  driven  by  groups  of  three  power  actuators  with  ; 

the  exception  of  the  stabilizer  and  lower  rudder  which  used  four  and  the  upper  rudder  ' 

which  used  two.  Output  levels  were  selected  so  that  four  actuator  groups  could  develop  ^ 

133?  and  three  actuator  groups  150?  of  the  surface  design  hinge  moment.  Accordingly,  i 

following  a single  channel  failure,  depressurization  and  subsequent  bypass  of  the  channel 
would  allow  essentially  100?  of  the  design  value  to  be  developed.  Normal  operation  of  the 
actuator  groups,  like  the  MS  units,  was  In  a force  averaging  mode.  Operation  of  the  EC 
servos,  due  to  the  equalizer  feedback  concept  employed,  was  typical  of  a majority  voting  ‘ 

mode . 

Active  Standby: 

A redundancy  concept  frequently  utilized  In  applications  where  completely  Inde- 
pendent electrical  and  hydraulic  sources  are  limited  Is  that  popularly  known  as  active 
standby.  The  concept  utilizes  a two  or  three  channel  secondary  actuator  with  the  outputs 
mechanically  bussed  together;  however,  as  Implied  only  one  channel  controls  the  load  since 
the  standby  channel  plston(s)  are  bypassed  hydraulically  and  hence  In  a standby  mode. 

Although  bypassed,  the  standby  channel  plston(s)  track  the  active  piston,  the  EHV(s) 
receive  an  equivalent  Input  signal  and  accordingly  track  the  active  channel  EKV.  Generally, 
each  channel  would  include  an  electronic  model  of  the  EHV  control  loop  for  comparison  with 
actual  EHV  output  obtained  via  an  LVDT  monitoring  the  valve  second  stage  output.  LVDT 
feedback  during  normal  operation  will  limit  differences  between  the  active  and  standby 
channels  to  minimize  transients  should  reversion  become  necessary.  Upon  detecting  an  out- 
put difference  above  a pre-determined  limit  In  the  active  channel,  ar  electronic  comparltor 
output  would  then  Initiate  a correction  sequence  disabling  the  channel  and  activating  a 
standby  channel.  The  failure  detection/correction  logic  may  also  be  Implemented  hydro- 
mechanlcally  although  this  may  be  a more  expensive  alternative  due  to  the  precision  com-  , 

ponents  required. 

The  actual  level  of  redundancy  obtained  will  be  dependent  on  the  number  of 
channels  and  the  particular  mechanization;  however,  although  two  channels  may  be  limited  to 
SFO,  three  channels  (one  actlve/two  standby)  will  not  necessarily  provide  full  DFO 
capability.  Frequently  less  remote  dual  failure  protection  possibilities  are  provided  for 
without  Incurring  the  complication  and  expense  of  accommodating  all. 

The  secondary  actuator  may  be  Integral  to  or  remote  from  the  power  actuator 
which  most  often  will  be  a dual  tandem  configuration.  Hydraulic  supply  failure(s)  r.ust  ' 

also  be  accommodated  by  the  secondary  actuator  assembly;  however,  if  only  two  Independent 
sources  are  available  (frequently  typical  In  fighter  aircraft),  a sincle  system  may  be 
shared  between  two  of  three  channels  or  a method  may  be  devised  to  provide  partial 
separation  within  one  system.  These  expedients  may  be  necessary  in  many  multi-channel 
applications . 

Ac: ive-Actl ve; 

A modification  of  the  active  standby  concept  Involves  simultaneous  operation  cf  a 
pair  of  active  channels  using  the  output  of  each  to  directly  control  one  half  of  a tandem 
power  actuator.  A third  channel  may  also  be  conveniently  utilized  as  a hydromechanical 
model  which  will  more  effectively  model  typical  hydrcmechanlcal  non-1 Inear 1 t les  as  compared 
to  an  electroni  ■ model.  The  output  of  the  tw'  ac'Ive  channels  nay  be  balanced  by  a spool 
equalizer  valve  to  .ffset  any  Input  or  EHV  Ir/iuco.l  out!  ut  d I f fer-'nces . DlfferenMal  ou* - 
put  of  h th  EHV.o  l.s  sensed  by  the  equalizer  and  subsequen*  spr  -1  position  fed  back  moch- 
anl  tally  t-  the  EHV  torque  motors.  Excess  spool  d 1 sf  1 acem'-n*  (above  a pre-deterr.lned  llr.lt'  ' 

wouM  signal  a failure  In  >ne  -f  the  active  channel.^.  ^ betwoet;  the  active 
channel  outputs  ai.  1 •'he  third  channel  model  enable.".  th>-  hyiraull  • failure  li-rlc  • . select 
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and  disengage  the  faulty  channel.  The  actuator  then  continues  to  operate  at  one-half  of 
the  normal  output  level.  A failure  In  the  model  channel  does  not  affect  active  channel 
operation,  but  upon  detection  provides  an  electrical  output  for  Indication  to  the  pilot. 

This  type  of  mechanization  Is  typical  of  the  F-111  CAS  actuator.  The  package 
and  Its  failure  logic  are  Integral  and  It  Is  remote  from  the  power  actuators.  Mechanical 
pilot  Input  Is  Interfaced  at  a point  between  the  CAS  and  the  power  actuators.  The  air- 
craft utility  hydraulic  system  supplies  one  active  and  the  model  channel  and  the  primary 
hydraulic  system  supplies  the  remaining  active  channel.  The  level  of  redundancy  Is  SFC, 
although  continued  operation  Is  possible  following  a number  of  specific  dual  failure 
occurrances.  (Ref  3) 

Actlve-On-Llne : 

A final  variation  of  active  standby  to  be  discussed  Is  actlve-on-I ine  which  may 
have  considerable  near  future  application.  The  primary  difference  exhibited  by  this 
approach  is  that  the  on-line  (standby)  channel  pistons  are  not  bypassed  and  track  the 
active  channel  of  their  own  accord  rather  than  being  forced  by  it.  Active  channel  control 
Is  assured  by  a high  gain  negative  force  feedback  which  limits  the  output  force  of  the 
on-line  channels.  On  the  other  hand,  the  on-line  channels  normally  operate  in  a force 
sharing  mode  and  will  immediately  oppose  a failed  channel.  The  failure  logic  would  Incor- 
porate an  electronic  model  In  each  channel  with  Inner  and  outer  loop  eomparltors . 

Following  detection  of  a signal  mismatch  above  pre-determined  limits  Ir.  the  active  channel, 
a sequence  would  be  initiated,  after  an  appropriate  time  delay,  to  disable  the  active 
channel  and  cut-off  the  feedback  circuit  In  the  first  on-line  channel  making  it  active. 
Initial  opposition  to  the  failure  occurs  without  Intervention  of  the  detect  ion/correct Icn 
logic  and  Its  associated  time  delay.  Pressure  feedback  Is  utilized  to  limit  errors  be- 
tween the  active  and  on-line  channels,  therefore  minimizing  switching  transients.  With 
three  channels,  DFO  capability  is  practical.  This  concept  may  prove  to  be  most  tolerant 
of  dual  failures,  having  the  least  adverse  effect  on  post  failure  performance. 

Redundancy  Criteria: 

Proper  redundancy  management  Is  Integral  to  the  achievement  of  adequate  flight 
safety  and  mission  effectiveness  for  high  authority  CAS  and  FEW  systems.  It  Is  also 
essential  to  effective  cost  control,  successful  design.  Implementation  and  operation,  and 
Is  frequently  the  most  difficult  technical  challenge  facing  the  actuator  designer-.  The 
approach  to  be  taken  and  the  extent  of  redundancy  necessary  to  meet  specified  reliability 
levels  actually  determines  the  mechanization  and  the  extent  of  complexity  of  the  secondary 
actuation  system  elements.  The  significant  factors  are: 

a.  Criticality  of  the  function  to  be  performed  to  flight  safety  and  mission 
success.  If  after  one  or  more  reasonably  probable  failures.  Including  hydraulic  supply 
failure,  continued  operation  of  a control  surface  Is  essential,  then  appropriate  actuator 
redundancy  must  be  Incorporated  to  assure  given  performance  levels  after  each  occurrance. 
Generally  electrical  control  loop  reliability  must  be  equivalent  to  that  of  a primary 
mechanical  path. 

b.  Determination  of  the  basic  control  modes.  Is  the  primary  control  mode  to  be 
electrical  or  mechanical  and  is  a dissimilar  back-up  mode  to  be  Incorporated?  If 
electrical,  the  control  authority  will  usually  be  lOOS  and  the  greater  the  authority,  the 
more  comprehensive  the  redundancy  must  be.  A mechanical  backup  may  reduce  the  extent  of 
secondary  actuator  redundancy;  however.  It  also  Introduces  electrical/mechanical  inter- 
facing problems. 

c.  Type  of  aircraft  and  its  configuration.  Typical  variables  are:  the  type  of 
aircraft  and  mission;  power  source  availability  (hydraulic  and  electrical),  space/weight 
limitations,  interactive  effects  between  control  axes,  the  degree  of  Integration  required, 
etc . 

d.  The  level  of  reliability  of  Individual  components  and  equipment  Items  in  the 
system.  The  effect  the  reliability  of  individual  items  In  the  system  Is  relatively 
straightforward.  Reliability  in  general,  particularly  of  electronic  equipment  Is  con- 
stantly Improving,  which  facilitates  achievement  of  the  overall  system  goal. 

e.  The  degree  of  system  Integrity  enhancement  necessary  to  assure  confidence  in 
the  FEW  approach  Itself.  FEW  is  a major  technology  breakthrough  which  Is  understandably 
difficult  to  accept  as  an  adequately  reliable  complete  .substitute  for  the  familiar  mech- 
anical system.  Accordingly  there  may  be  a tendency  to  Incorporate  too  much  redundancy 
which  In  actuality  will  add  complexity  and  cost  and  Increase  failure  probabilities.  It 

Is  well  to  note  here  that  each  actlye  electrical  signal  path  requires  an  Independent 
secondary  actuator  channel.  The  point  at  which  the  redundancy  leyel  Is  effective  and  not 
excessive  Is  not  readily  apparent;  however,  the  designer  should  exercise  the  utmost  caution 
to  avoid  possible  compromise  of  reliability  while  attempting  to  provide  additional 
Insurance . 


Redundancy  level  and  approach  have  a direct  Impact  on  actuation,  particularly  on 
secondary  actuation,  and  few  PCS  reliability  related  considerations  may  be  made  Inde- 
pendently. Once  the  redundancy  level  and  approach  have  been  determined,  the  secondary 
actuator  Is  basically  defined.  In  addition,  failure  detection/correction,  feedback  Imple- 
mentation, transient  protection,  freedom  from  Inadvertent  channel  shutdowns  (when  no 
actual  failure  has  occurred),  control  of  failure  degradation,  etc.  are  functions  which 
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must  be  accommodated  within  the  actuator  design. 

Accordingly,  It  may  be  observed  that  the  design  and  Implementation  of  an 
effective  actuation  system  redundancy  management  concept  Is  a very  comprehensive  techni- 
cal task.  It  Is  essential  to  Incorporate  redundancy  with  the  FBW  system  concepts  which 
offer  so  much  potential  to  near  future  aerial  warfare  capability.  The  capability  exists 
to  achieve  required  reliability  within  reasonable  risk  and  cost  limitations.  The  degree 
of  success  realized  will  be  dependent  on  adequate  background,  design  flexibility  and 
appropriate  priority. 

ACTUATION  SYSTEM  DESIGN  OBJECTIVES 


The  following  are  suggested  not  only  to  obtain  actuation  system  reliability, 
but  effective  system  performance,  reasonable  life  cycle  cost  and  Improved  maintainability 
as  well.  This  material  Is  based  on  Air  Force  Flight  Dynamics  Laboratory  (AFFDL)  actuation 
system  related  experience,  particularly  FBW  associated  programs,  obtained  through  develop- 
ment efforts  dating  back  approximately  twelve  years.  These  recommendations  are  broad 
In  nature  and  do  not  reflect  the  many  detailed  considerations  which  must  be  made  In  the 
process  of  designing  a successful  state-of-the-art  system. 

Reliability: 

To  a CAS  and  particularly  to  a FBW  system, adequate  reliability  Is  a most  critical 
factor  since  It  is  inherent  to  flight  safety  and  mission  effectiveness.  Excluding  the  back- 
up system  approach,  application  of  some  redundancy  concept  to  the  secondary  actuator  Is  the 
sole  method  by  which  the  needed  reliability  can  be  practically  obtained.  Selection  of  a 
basic  concept  and  determination  of  necessary  redundancy  levels  are  the  m.ajor  decisions  to 
be  made  relative  to  definition  of  the  actuator  mechanizations.  A sound  and  broad  back- 
ground of  information,  preferably  based  on  development  program  or  actual  experience  will 
be  pre-requisite  to  making  the  proper  decisions.  Less  than  optimum  resolution  of  these 
questions.  In  addition  to  compromise  of  safety  and  effectiveness,  may  also  adversely  affect 
life  cycle  costs,  complexity,  size  and  weight.  Thoroughly  evaluate  the  data  available  from 
all  sources  In  conjunction  with  the  previously  discussed  redundancy  factors,  then  design 
the  actuation  system  mechanization  to  meet  the  requirements  and  enhance  overall  reliability, 
rather  than  detracting  from  It. 

Life  Cycle  Cost: 

Historically  the  actuation  elements  of  a FCS  have  been  relatively  expensive;  how- 
ever, present  state-of-the-art  sophistication  can  tend  to  aggravate  the  problem.  Obviously 
the  FCS  Is  a major  aircraft  system  and  can  have  an  associated  major  Impact  on  the  total 
weapon  systems  cost.  Included  in  the  life  cycle  cost  are  not  only  acquisition  costs 
(development,  design.  Implementation  and  test)  but  operational  logistics  and  maintenance 
costs  throughout  the  lifetime  of  the  system.  The  Impact  of  high  costs  Is  to  ll.mlt  weapon 
systems  procurement  In  some  cases.  If  not  result  in  cancellation  of  an  entire  program  as 
has  happened.  With  the  demands  placed  on  the  actuation  system  by  performance,  reliability 
and  other  requirements,  actuator  associated  expense  Is  not  surprising  rarticularly  In 
conslderat ion  of  the  precision  electromechanical  and  hydroraechani-'al  components  necessary. 

A number  of  the  below  listed  problems  also  constitute  contributing  factors.  Although  a 
truly  low  cost  actuator  appears  to  be  an  unobtainable  goal,  substantial  cos*  effect Iveness 
Improvement  Is  entirely  practical.  Accordingly,  based  or,  thorough  evaluat  1 r--. , ccm.plete 
and  reasonable  actuation  system  cost  objectives  should  be  established.  The  sys‘er  shculd 
subsequently  be  designed  for  compliance  with  these  objectives. 

Complexity: 

The  causes  of  actuation  system  complexity  are  nearly  synonym-us  w'.'h  ‘r'se  "f 
high  cost;  where  excessive  complexity  exists,  so  will  excessive  cost.  ''■-.plexi'y  seriously 
compromises  the  achievement  of  effective  reliability,  for  exam.ple,  as  'h'-  r.  :mt  f ;■  f 
ponents  Increase  the  number  of  potential  failures  which  could  occur  will  similarly  l:‘."rease. 
Excessive  complexity  can. also  adversely  affect  perform.ance , mission  effec- 1 veness  and 
flight  safety.  Due  to  typically  high  performance  level  requirements,  rodunla.-u-y  and  Ir.’er- 


faclng  requirements,  the  military  aircraft  In  particular  will  alway.".  pxh;!'l*  .a  ro  la' 1 v.' 1 v 
high  level  of  complexity;  however,  all  resources  which  may  be  applied  ‘ ;!ml'  ■ m.ilexl'y, 
while  satisfying  performance  and  other  criteria,  will  he  well  sp . 1'  Is  ai.-  : er.' 


to  note  that  the  most  straightforward  design  may  also  produce  'he  higher.' 
Integrity . 

Performance : 


Actuation  system  performance  Is  ^f  primary  Irpor:  am.-*- 
the  vehicle;  therefore.  It  must  be  conslsten'  wl':.  '.•■■rail  sy,  ' • 

new  applications  require  Improved  actuat  r r<-.rt  .'i:  a! 

other  hand,  the  techniques  required  are  'ii-.aal.y  . wl'-  ’ • ‘,- 

Frequently  better  performance  sl.mply  me-i-.s  m -■  ’ ' • 

limiting  factor  Is  not  technically  bas--l,  t ,•  ’ ■ 

emphasis  Is  always  placed  on  perrorn.'tncr  w.hl.-  » ’ • 

regarded  by  many  as  unavoidable  dl sadva;.' -i.'--.- . . i 

sophistication  typical  of  actuation  sy.-'em.:  a:,:  • h-  •• 
to  apply  reasonable  controls.  Little  --r  ; -rf 

the  process  and  the  benefits  of  reduced  costs  :ii, : r;  ; 
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the  life  of  the  weapons  system.  The  need  for  aircraft  which  are  practical  to  buy,  operate 
and  maintain  is  nearly  as  predominant  as  the  need  for  superior  performance. 

Commonality ; 

Although  the  basic  redundancy  concept  to  be  utilized  on  a given  aircraft  assures 
some  degree  of  inter-axis  commonality,  considerably  more  effort  can  be  effectively  utilized 
to  design  actuatores  for  more  universal  application.  In  most  aircraft,  basic  similarity 
is  only  typical  in  the  roll  axis  primary  controls  where  differences  are  usually  ll.r.lted 
to  those  needed  for  right  and  l*ft  hand  applications,  flow  with  the  advent  of  the  F-:', 
the  same  basic  actuator  is  utilized  to  drive  each  horizontal  stabllator,  both  flaperor.s 
and  the  rudder  surface.  Differences  are  Incorporated  to  accommodate  stroke,  force  output 
and  rate  requirements;  however,  one  actuator  and  one  servovalve  specification  completely 
define  the  three  configurations  required  for  all  surface  applications.  Modularization 
concepts  are  particularly  applicable  tc  actuators.  As  above,  a basic  control  manifold, 
or  valve  module  can  be  designed  for  impelmentation  with  various  ratings  to  match  different 
pov/er  actuator  requirements.  The  idea  of  modularization  is  not  new  although  application 
to  actuation  systems  is  relatively  limited.  Any  practical  expedient  to  promote  more 
universal  actuator  or  other  equipment  applications  should  be  promoted. 

Integration; 

Frequently  the  actuation  system  for  a given  surface  application  will  be  Implement- 
ed In  two  physically  separated  assemblies,  the  power  actuator  and  the  associated  secondary 
actuator.  Significant  disadvantages  with  this  approach  are  that  mechanical  anomaly  will 
be  aggravated  In  proportion  to  the  length  of  the  signal  path  between  the  DRVs , the  extent 
of  actuator  Interfacing,  i.e.  structural,  electrical  and  hydraulic,  electrlca!!,  may  be 
doubled,  and  costs  will  undoubtedly  be  greater.  Hence,  where  it  is  at  all  practical, 
integration  of  these  major  actuation  system  elements  is  strongly  recommended.  F-15 
designers  also  adopted  this  approach  which  compared  to  the  non-integrated  YF-16  actuators 
are  expected  to  result  in  substantial  per  aircraft  savings. 

Back-Up  Systems: 

The  implementation  of  a back-up  system,  which  Implies  application  of  a dissimilar 
method  by  which  a given  task  may  be  accomplished,  may  be  quite  effective  for  selected  func- 
tions. Further,  back-up  mechanical  controls  may  be  an  absolute  necessity  where  for  envl  r-;-:- 
mental  or  other  reasons,  a primary  electrical  control  system  could  not  be  self-sufficient. 

On  the  other  hand,  implementation  of  a back-up  mode  will  Impose  disadvantages,  for 
example  overall  costs  will  probably  be  greater,  performance  degraded,  complexity,  weight 
and  space  requirements  will  be  increased.  Accordingly,  if  at  all  practical,  it  is 
generally  better  practice  to  provide  the  needed  reliability  within  the  basic  primary 
system. 

Sensitivity: 

This  redundant  system  characteristic  Is  related  to  the  susceptibility  of  the 
control  mechanization  to  certain  normal  operating  conditions  which  may  appear  to  the 
failure  logic  to  be  an  actual  failure.  An  effective  actuation  system  must  be  designed  to 
avoid  these  nuisance  failures  which  will  cause  the  actuator  to  switch  to  an  other  than 
normal  mode  of  operation.  Selection  of  failure  detection  levels  must  be  made  tc  accommodate 
system  pressure  variations  and  other  transient  conditions  without  triggering  the  failure 
detect/correct  sequence.  Inherent  tolerances  must  be  accurately  estimated  and  the  re- 
sulting normal  operation  band  cannot  be  too  restrictive.  Frequently  a time  delay  is 
successfully  utilized  to  reduce  the  possibility  of  nuisance  tripping.  On  the  other  hand, 
once  the  alternate  or  standby  mode  of  operation  has  been  selected,  latching  provisions 
should  be  Incorporated  to  pi'event  migration  back  to  the  non-falled  mode. 

New  vs  Mature  Equipment; 

In  the  design  of  a new  actuation  system,  some  equipment  of  new  design  will  be  re- 
quired to  fulfill  specific  requirements  unique  to  the  vehicle.  As  compared  to  the  use  of 
existing,  mature  equipment,  design,  implementation  and  test  costs  will  he  higher,  technical 
risk  Is  Increased  and  reliability  can  only  be  estimated.  In  view  of  the  Inherent  dis- 
advantages of  new  equipment,  a concerted  effort  should  be  made  to  achieve  an  effective  mix 
of  new  and  mature  equipment  in  the  system  design. 

Maintainability; 

This  problem  Is  longstanding;  however,  the  Implication  of  many  different  and 
highly  sophisticated  actuation  systems  is  apparent;  maintainability  will  not  Inpi'cve  of  l"s 
own  accord.  It  too  needs  appropriate  emphasis  In  the  design  process  if  the  designer  Is  to 
produce  a relatively  straightforward,  foolproof  and  reliable  actuator  conflgurat  1 ■ n v?hlch 
compiles  with  all  requirements.  Brief  consideration  of  statistical  maintenance  da'-,  "n 
operational  systems  documents  the  need  for  Improvement  in  this  area. 

Flexibility: 

Frequently,  some  manufacturers  tend  to  apply  a standard  In-house  approach  to 
actuation  systems  design.  Where  Inadequate  knowledge  and  experience  with  oth 'r  alternative 
concepts  exists  this  may  be  the  sole  practical  expedient,  but  one  which  may  easily  lead  to 
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reduced  system  effectiveness,  Increased  complexity  or  other  compromise.  To  successfully 
manage  and  apply  modern  actuation  concepts  to  a new  vehicle,  no  substitute  will  be  found 
for  a broad  and  in-depth  understanding  of  available  technology.  No  single  approach  can  be 
universally  applied;  however,  the  experience  and  expertise  which  may  be  derived  from  a con- 
tinuing and  comprehensive  development  program  will  provide  the  background  a.-id  objectivity 
necessary  to  define  the  optimum  approach. 

FCS/Aotuator  Specifications; 

There  are  two  generally  applicable  military  specifications  which  will  provide 
substantial  guidance  to  the  actuation  systems  designer.  The  first  Is  USAF  specification 
MIL-F-9^90,  "Flight  Control  Systems  - Design,  Installation  and  Test  of  Piloted  Aircraft, 
General  Specification  For."  This  specification  was  recently  revised  extensively  In  a pro- 
gram sponsored  by  AFFDL.  Revision  D was  released  in  June  of  1975  and  a comprehensive 
users  guide  (AFFDL-TR- 7.^-1 16 ) has  also  been  made  available.  The  basic  document  Is  com- 
patible with  modern  FCS  technology  and  as  .mentioned  above  adopts  a syste.ms  design  approach. 
General  FCS  criteria,  not  covered  In  detail  specifications  will  be  controlled  by  this 
document  on  new  AF  piloted  aircraft  procurement  programs.  The  specification  and  Its 
associated  user's  guide  are  quite  comprehensive  and  will  prove  to  be  unusually  effective. 
Informative  and  up-to-date. 

In  addition  AFFDL  has  prepared  an  actuator  specification  entitled:  "Actuators, 

Aircraft  Flight  Controls,  Power  Operated,  Hydraulic,  General  Specification  For."  The 
document  is  presently  in  the  process  of  Industry  review  and  Is  expected  to  be  released 
during  1977.  (Military  specification  number  not  yet  assigned.)  Interim  guidance  Is  avail- 
able in  a Society  of  Automotive  Engineers  (SAE)  Publication,  Aerospace  Recommended  Practice 
(ARP)  number  1281  entitled,  "Servoactuators : Aircraft  Flight  Controls,  Power  Operated, 
Hydraulic."  etc. 

The  importance  of  technical  specifications  cannot  be  overstated.  They  assist  In 
definition  of  the  lines  of  responsibility,  provide  a basis  for  arriving  at  an  initial  cost 
breakdown  and  they  describe  the  physical  and  other  characteristics  of  package  to  be 
delivered.  Package  performance,  reliability,  maintainability  and  required  tests  and  test 
methods  are  usually  fully  defined.  The  two  basic  types  of  specifications,  general  and 
detail  are  each  complementary  to  the  other  so  that  a component  or  assembly,  l.e.  EHV  and 
actuator  respectively,  will  be  required  to  comply  with  both  and  ultimately  with  the  system 
speclflcat ion(s ) , l.e.  MIL-F-9^90.  Detail  specifications  must  accordingly  be  carefully 
prepared  to  comply  with  specifications  for  the  next  higher  order  of  assembly,  etc.  The 
designer  will  acquire  a detailed  familiarity  with  these  documents  and  will  continue  to 
work  closely  with  them  throughout  the  design.  Implementation  and  test  processes. 

Design  Approach; 

Since  the  actuation  system  has  a predomlnent  effect  on  vehicle  performance  and 
on  the  basic  vehicle  configuration,  the  actuation  design  must  be  given  adequate  considera- 
tion early  in  the  design  process.  Traditionally  preliminary  FCS  design  incorporates  the 
aerodynamic,  propulsion  and  structural  disciplines  in  determl nation  of  the  initial  aircraft 
configuration.  New  actuator  applications  such  as  direct  lift  and  sideforcecontrol  enhance 
basic  controllability,  others  allow  reduction  of  Inherent  vehicle  stability,  provide  flutter 
mode  control  etc.  through  automatic  actuation  systems  which  can  substantially  affect  con- 
figuration and  the  primary  disciplines.  Accordingly,  the  FCS  must  be  considered  at  an 
earlier  point  in  the  design  program  so  that  the  interactive  effects  may  be  fully  defined 
and  best  advantage  taken  of  available  FCS  technology. 

Similarly,  the  increasing  trend  to  Integrate  the  major  systems  Involves  another 
change  to  the  traditional  design  approach.  The  integration  itself  demands  extensive  inter- 
disciplinary liaison;  however,  an  organizational  structure  broken  down  into  separate  design 
groups  tends  to  limit  the  necessary  contact  since  there  is  little  Inherent  encouragement  to 
cross  the  existing  organization  boundaries.  Major  success  in  implementing  new  aircraft 
having  Integrated  systems  and  otherwise  demonstrating  effective  application  of  available 
technology  will  be  best  accomplished  via  organizational  structuring  which  facilitates  and 
encourages  technical  interchange.  Designers  must  have  a broader  background  to  be  able  to 
understand  interactive  effects  on  the  total  vehicle.  Revision  D of  MIL-F-9^90,  "Flight 
Control  Systems  - Design,  Installation  and  Test,"  etc,  emphasizes  a total  systems  approach, 
reflecting  the  Interdependence  of  components,  equipment  and  systems  In  recognition  of 
these  changing  requirements.  (Ref  i) 

FBW  Development  Programs; 

To  effectively  manage  and  apply  modern  actuation  concepts  to  a new  vehicle,  no 
substitute  for  a broad  and  In-depth  understanding  of  available  FBW  and  redundancy  management 
techniques  will  be  found.  Although  a considerable  amount  of  descriptive  material  exists 
today,  in  the  absence  of  sufficient  past  experience  it  Is  likely  that  a FBW  development 
program  would  be  pre-requlslte  to  optimum  design  of  a new  actuation  system.  Such  programs 
should  be  comprehensive,  continuous  and  not  limited  to  an  investigation  of  a single  approach 
or  concept.  Such  narrow  objectives  may  have  immediate  benefits  related  to  specific  applica- 
tions; however,  the  airframe  manufacturer  - and  the  customer  - should  have  a detailed 
familiarity  of  all  potential  approaches.  This  capability  will  promote  the  objectivity 
necessary  to  proper  selection,  as  opposed  to  a continuation  of  past  practices  and  techniques 
which  could  be  Improved  upon. 

An  adequate  development  program  requires  the  committment  of  substantial  funding 
and  manpower  resources;  however,  there  are  avenues  by  which  these  expenditures  can  be 
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maintained  at  practical  levels.  First,  If  a program  is  not  Initiated  on  a crash  basis. 

It  may  be  conducted  over  a period  of  time  with  reasonable  and  relatively  level  manning. 
Secondly,  Joint  programs  can  be  an  efficient  method  by  which  experience  and  similar 
objectives  can  be  realized  by  the  parties  Involved  at  a much  lower  cost  to  both.  For 
example,  to  support  a laboratory  actuation  system  test  program,  an  equipment  manufacturer 
may  find  it  to  his  advantage  to  provide  the  actuator  to  gain  experience  with  a new  concept 
and  establish  a better  competitive  position  which  could  lead  to  subsequent  production. 

The  prime  manufacturer  conducts  the  laboratory  program,  presumably  making  pertinent  test 
data  available  to  the  actuator  manufacturer.  Thirdly,  the  military  services  commonly 
‘ support  developme.it  programs  as  Is  In  their  best  interest,  since  they  are  the  ultimate 

customer.  This  is  undoubtedly  a most  signlflcaht  factor  with  respect  to  the  level  of  FCS 
I and  actuation  technology  available  today. 

IThe  approach  that  actuation  system  development  programs  should  take  must  be  all 
encompassing,  but  obviously  is  dependent  on  experience  to  date.  Keeping  abreast  of  avail- 
able technology  and  current  developments  is  the  prime  objective;  however,  actual  experience 
In  the  laboratory  with  redundant  FEW  actuation  system  equipment  is  even  more  essential  to 
. maintaining  an  adequate  background.  Studies,  analyses  and  claimed  performance  may  be 

verified  and  problems  will  be  encountered  and  effectively  resolved,  avoiding  the  associated 
[ expense  of  later  disclosure  during  flight  test,  where  worse  consequences  may  also  be 

Incurred.  When  the  opportunity  arises  to  propose,  design  and  Implement  an  actuation  system 
for  a new  application.  It  may  then  be  accomplished  efficiently  on  the  basis  of  a firm 
knowledge  of  the  factors  Involved. 

New  Concepts; 

Several  new  approaches  having  the  potential  to  resolve  some  current  actuator 
problem  areas  are  now  being  developed.  A most  Important  concept  is  a direct  drive  valve 
mechanization  where  the  actuator  main  control  valve  is  driven  electromechanically . This 
' allows  elimination  of  the  intermediate  hydraulic  amplification  stages  of  secondary  actuators, 

drastically  reducing  the  complexity  and  cost  inherent  to  the  conventional  hydromechanical 
' secondary  actuation  approach. 

Direct  Drive: 

i Methods  evolved  to  date  for  obtaining  sufficient  force  electromechanically, 

within  a practically  sized  package,  have  been  dependent  on  the  use  of  samarium  cobalt  as  a 
high  dnergy  magnetic  material.  In  a recently  developed  process,  powered  samarium  cobalt 
is  mechanically  compressed  forming  a material  of  superior  magnetic  strength  with  high 
resistance  to  de-magnetization.  The  material  has  considerable  potential  for  aircraft 
application  since  it  is  expected  to  ultimately  reduce  the  size  of  electric  motors, 
generators,  and  other  transducer  types  as  well. 

Unique  to  the  direct  drive  approach  is  the  requirement  that  the  electrical  input 
signals  to  the  actuator  must  be  of  sufficient  strength  to  provide  the  valve  driving  force 
without  supplementary  amplification.  Conventionally,  an  EHV  in  each  channel  converts  a 
low  strength  electrical  input  to  a low  strength  hydraulic  signal.  The  hydraulic  signal 
I then  drives  a piston  which  produces  a force  multiplied,  proportional  mechanical  output.  A 

' relatively  high  force  electromechanical  transducer,  utilizing  the  samarium  cobalt  magnetic 

I material,  is  substituted  for  the  EHV  and  the  hydraulic  piston  in  the  direct  drive  case. 

[ Accordingly,  the  transducer  force  output  itself  is  applied  in  the  same  manner  as  the 

[ intermediate  (secondary)  hydraulic  actuator  output.  The  force  which  may  thus  be  obtained 

i within  workable  size  and  weight  limitations  appears  to  be  adequate  for  most  applications. 

I Single  or  multiple  transducers  arrangements  are  practical  to  assure  adequate  force  and 

[ redundancy.  Each  transducer  may  also  have  up  to  four  individual  coils.  Two  transducers 

i driving  a valve  in  a push-pull  mode  may  be  most  attractive  mechanically  - provided  that  the 

force-fight  problem  may  be  effectively  overcome.  The  ultimate  success  of  this  concept  with 
! respect  to  performance  is  yet  to  be  determined;  on  the  other  hand  the  potential  to  simplify 

the  actuator,  reducing  the  requirement  for  precision  hydromechanical  components  and  very 
i favorably  affect  its  co.st  is  encouraging. 

Two  additional  direct  drive  advantages  are  that  drlver/valve  modules  could  be 
developed  to  suit  different  requirements  and  the  higher  electrical  signal  power  levels 
would  reduce  susceptabl 1 1 ty  to  electromagnetic  interference.  Initial  direct  drive  develop- 
ment has  been  accomplished  in  a series  of  programs  sponsored  by  the  U.S.  Navy.  An 
actuator  utilizing  dual  high  output  torqueraotors  driving  the  control  valve  in  a push-pull 
mode  has  been  implemented  and  tested  in  the  laboratory.  (Ref  A)  Development  of  a flight- 
worthy  package  to  be  flight  tested  soon  Is  now  in  progre.ss.  The  U.S.  Air  Force  is  now 
actively  pursuing  two  direct  drive  approaches  similar  to  the  above.  One  utilizes  a single 
linear  motor  drive,  the  other  dual  voice  coil  drivers  operating  in  a push-pull  mode.  In 
I both,  the  drivers  are  assembled  directly  to  the  valve  concentric  to  the  spool  centerline. 

1 Flight  test  of  both  mechanizations  being  developed  in  the  curr''nt  U.S.  Air  Force  programs 

i l.s  planned  for  the  near  future. 

Power- By- Wire  (PBW); 

There  are  two  basic  PBW  approache.s  now  being  explored  by  the  U.S.  Air  Force  and 
industry  which  are  both  dependent,  on  electrical  power  to  drive  the  control  surface.  The 
first  is  the  Integrated  Actuator  Package  (lAP)  and  the  second  is  the  Electro  Mechanical 
(EM)  actuator. 
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The  lAP  is  a concept  originally  pursued  by  the  U.S.  Air  Force  primarily  to  Improve 
the  survivability  aspects  of  military  aircraft,  particularly  fighters.  The  package  con- 
tains an  essentially  conventional  actuator,  but  In  addition  Incorporates  Its  own  modularized 
hydraulic  supply  to  e Imlnate  dependence  on  the  central  hydraulic  source.  The  need  for 
vulnerable  hydraulic  lines  is  eliminated,  routing  of  electrical  cables  may  be  accomplished 
with  comparative  ease,  leakage  problems  are  eliminated,  interfacing  is  reduced  and  the 
electrical  power  distribution  system  may  be  protected  by  circuit  breakers. 

The  most  promising  lAP  mechanization  utilizes  an  electrically  controlled  servo- 
pump  supplied  by  an  integral  reservoir.  The  servopump  consists  of  an  electric  motor 
driving  two  pumps  on  a common  shaft.  A secondary  pump  maintains  actuator  stiffness  while 
the  primary  pump  provides  power  on  demand  to  drive  the  control  surface.  Secondary  pump 
displacement  may  be  fixed  or  controlled  as  a function  of  primary  pump  output.  The  primary 
pump  swashplate  is  controlled  directly  from  a FEW  signal  utilizing  an  EHV,  or  an  electro- 
mechanical transducer  could  be  substituted.  Control  is  thus  quite  convenient,  and  since 
mechanized  as  a demand  system,  power  consumption  is  approximately  1/3  less  than  that  of  the 
conventional  actuator.  Power  modulation  of  the  unit  is  via  destroklng  of  the  pump  pistons 
as  a function  of  swashplate  angle,  which  must  also  be  capable  of  overcenter  operation  to 
reverse  flow  direction  to  the  actuator.  Since  the  power  is  not  valve  modulated,  the 
inherent  valve  pressure  drop  and  heat  rejection  are  avoided.  In  comparison  to  the 
standard  secondary  actuator  this  approach,  like  the  direct  drive,  is  also  considerably 
more  straightforward  and  less  expensive. 

The  EM  actuator  is  also  dependent  on  the  samarium  cobalt  magnetic  material  to 
develop  high  output  torques  within  a reasonably  sized  package.  The  U.S.  Air  Force  is 
presently  sponsoring  a program  to  develop  and  implement  equipment  for  test  and  demonstra- 
tion purposes. 

There  are  two  basic  differences  between  the  lAP  and  the  EM  actuator.  First,  the 
EM  actuator  utilizes  no  hydraulics  and  is  powered  by  two  brushless  DC  motors  having 
samarium  cobalt  rotors.  Secondly,  electrical  power  to  the  actuator  is  modulated  by  a 
controller  upstream  which  integrates  the  electrical  signal  and  the  primary  power  inputs. 

The  two  motors  are  mounted  in-line  axially  at  opposite  ends  of  the  package,  driving  the 
surface  through  a centrally  located  planetary  gearbox.  The  result  is  a cylindrical  share 
of  relatively  small  cross  section  which  is  installed  and  functions  in  a manner  similar  to 
that  of  a power  hinge.  The  complete  EM  drive  includes  the  actuator  and  the  remote  con- 
troller which  contains  a transistorized  commutation  circuit.  Like  the  lAP.  this  concept 
will  also  be  conservative  of  energy  since  it  also  employs  an  on  demand  mode  of  operation. 

The  unit  being  developed  will  deliver  up  to  three  horsepower  at  the  surface  and  is  intended 
for  primary  control  application.  The  potential  of  this  concept  is  good  as  an  alternative 
to  hydraulically  powered  surfaces  on  high  density  aircraft. 

Conclusions : 

This  chapter  has  attempted  to  touch  upon  presently  significant  considerations 
involved  in  the  design  of  PCS  and  actuators  in  particular.  Because  so  many  factors  are 
involved  individual  treatments  have  bee.i  necessarily  brief  although  the  implications  should 
be  apparent.  Technically,  the  state-of-the-art  at  its  current  level  of  refinement  can 
satisfy  performance  demands  and  nearly  any  other  reasonable  technical  goals;  however,  a 
need  for  marked  improvement  definitely  exists  in  several  specific  areas. 

Stringent  and  sophisticated  redundant  techniques  are  needed  to  provide  the 
reliability  levels  essential  to  the  advanced  actuation  systems  of  modern  military  aircraft. 
Reduced  actuator  life  cycle  costs  are  needed  to  reduce  total  weapon  system  costs.  Thus 
providing  better  assurance  that  future  aircraft  procurement  programs  will  not  be  subject 
to  significant  cuts  or  complete  elimination  as  a direct  result  of  excessive  costs.  Much 
better  actuator  maintainability  is  required  to  improve  aircraft  availability  and  contribute 
to  lower  support  costs.  These  objectives  presently  should  be  of  primary  importance  and 
are  inherent  to  achievement  of  the  full  potential  of  advanced  technology  and  FBW  in 
particular. 

FBW  and  electrical  control  techniques  are  proving  to  be  a catalyst  to  broaden 
controls  applications  and  the  functions  which  may  be  performed.  These  advanced  techniques 
are  the  key  to  removing  longstanding  constraints,  optimizing  basic  aircraft  configuration 
and  making  new  modes  of  control  practical.  Exceptionally  high  performance  levels  are 
possible,  control  is  more  precise  and  automatic  functions  which  extend  to  other  major 
systems  are  now  realistic  design  options. 

With  this  outstanding  prospect,  actual  realization  of  the  benefits  now  within 
reach  is  dependent  primarily  upon  how  well  the  technology  is  implemented;  this  will  require 
reliable,  cost  effective  systems  and  equipment  which  are  practical  to  maintain.  It  is 
suggested  that  a particularly  good  place  to  start  is  in  the  design  of  improved  PCS  actuators. 
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SUMMARY 

The  purpose  of  this  paper  is  to  outline  development  procedures  required  to  implement  the  fly-by- 
wire flight  control  system  in  the  F-16  aircraft.  Several  developmental  efforts  were  required  to 
implement  the  flight  control  system  into  the  aircraft.  These  efforts  include  the  design  and  develop- 
ment of  specific  hardware,  including  the  sensors,  actuators,  and  the  flight  control  computer  itself. 
Once  the  subsystems  were  developed,  the  process  of  integration  and  definition  of  the  flight  control 
system  became  a developmental  effort.  Once  the  hardware  was  intecrated  into  the  aircraft,  the  develop- 
mental effort  then  swung  towards  on-aircraft  tests  to  ensure  that  the  flight  control  system  was 
compatible  with  the  airframe  within  the  operational  flight  envelope.  Once  these  ground  tests  were 
completed,  the  development  effort  then  concentrated  on  the  flight  test  portion  of  the  program  where 
the  flight  control  system  was  optimized  precision  tracking  in  the  air  superiority  and  ground  attack 
role.  The  F-16  flight  control  system  development  was  rather  unique  inasmuch  as  it  was  a two-fold 
effort.  A development  effort  was  undertaken  to  ensure  that  the  prototype  aircraft  could  indeed  meet 
the  safety  of  flight  requirements  and  then  the  effort  swung  towards  full  scale  development  of  the  fly- 
by-wire flight  control  system  for  a production  aircraft  application.  The  process  is  illustrated  in 
Figure  1,  which  expands  upon  the  development  process  and  shows  how  the  various  steps  interact.  It  also 
demonstrates  that  iterative  nature  of  the  process. 


SYSTEM  DESCRIPTION 

In  order  to  understand  the  procedures  required  to  develop  a fly-by-wire  flight  control  system,  a 
brief  description  of  the  system  is  necessary.  The  flight  control  system  in  the  F-16  is  a fly-by-wire 
system  in  all  three  control  axis.  Each  control  axis  has  four  separate  electrical  channels  and  two 
hydraulic  systems  for  redundancy  purposes.  There  are  no  mechanical  backup  controls  in  the  flight 
control  system.  Each  control  axis  has  pilot  input  commands,  (i.e.,  pitch  and  roll  stick  force  or 
rudder  pedal  force)  and  each  control  axis  employs  active  aircraft  feedback  so  that  the  flight  control 
system  is  an  aircraft  rate  and  acceleration  command  system  as  opposed  to  a more  conventional  control 
surface  position  command  system.  During  subsonic  flight  the  aircraft  is  statically  unstable  in  the 
pitch  axis  so  this  feedback  or  command  augmentation  system  is  required  in  order  to  fly  the  aircraft. 

In  addition  to  rate  and  acceleration  sensors  to  control  the  aircraft,  the  flight  control  system  receives 
inputs  from  the  central  air  data  computer  to  schedule  flight  control  system  gains  as  a function  of 
flight  condition.  In  order  to  ensure  full  time  operation  of  the  flight  control  system,  there  are 
seven  electrical  power  supply  sources  that  the  flight  control  system  can  call  on  to  power  its  four 
channels  of  electroriics  in  each  axis.  Additionally,  there  are  two  hydraulic  systems  in  the  aircraft 
with  a backup  emergency  hydraulic  pump  in  the  event  of  an  engine  failure  to  provide  hydraulic  power 
to  the  flight  control  systems.  Block  diagrams  of  one  channel  of  the  longitudinal,  lateral  and 
directional  portions  of  the  flight  control  system  are  shown  in  Figure  2 for  the  YF-16  and  Figure  3 for 
the  F-16.  Reference  1 contains  a complete  description  of  the  system.  As  shown  in  the  Figures, 
extensive  use  is  made  of  non-linear  shaping  and  flight  condition  gain  scheduling  within  the  flight 
control  system. 

HARDWARE  DESIGN  AND  DEVELOPMENT 

The  first  portion  of  the  development  of  the  system  was  to  define  the  configuration  of  the  hardware 
to  be  used  in  the  flight  control  system,  or  the  design  process.  A great  deal  of  material  has  been 
written  about  the  system  design  philosophy.  The  key  design  points  will  be  touched  upon  as  they  relate 
to  the  development  process.  Because  this  was  the  first  fly-by-wire  flight  control  system  employed  in 
a production  aircraft,  the  flight  control  system  remains  separated  from  all  other  subsystems  within 
the  aircraft.  The  flight  control  computer  is  a four  channel,  three  axis  analog  computer  which 
represents  an  extension  of  current  high  gain,  high  authority  control  augmentation  system  technology. 

The  computer  that  was  built  and  developed  for  the  prototype  aircraft  was  a 32  board  analog  computer 
employing  integrated  circuits  consisting  of  approximately  5500  piece  parts.  The  unit  was  designed  to 
be  forced  air  cooled  although  it  can  operate  for  extended  periods  of  time  without  cooling  air  before 
failing.  It  was  subjected  to  the  standard  qualification  tests  to  ensure  structural  integrity  within 
the  unit.  The  quad  redundant  computation  system  consists  of  three  active  channels  and  a monitored 
channel.  This  represented  an  extension  of  the  logic  in  the  F-111  Control  Augmentation  System.  The 
design  is  such  that  at  each  voting  point  within  the  computer,  a mid-channel  ir  selected  as  being  the 
"good  signal".  In  the  event  of  a channel  failure,  the  monitor  channel  then  switches  in  to  become 
the  third  channel  and  the  mid-channel  selection  process  continues.  In  the  event  of  a similar  failure 
at  the  same  point  of  the  computer,  the  remaining  two  channels  operate  with  the  value  closest  to  a null 
condition  being  selected  as  the  good  channel.  A rather  interesting  scheme  of  unit  signal  selection  is 
employed  at  each  voting  point  in  the  system.  The  signals  downstream  of  the  voter  are  compared  with 
the  upstream  signal  and  then  processed  by  an  op-amp.  The  high  and  low  signals  drive  the  amps  into 
saturation  and  the  median  signal  only  is  processed  through  the  voter.  The  gyroscopes  and  accelerometers 
used  in  the  flight  control  system  are  also  current  state-of-the  art  gyroscopes.  These  sensors  are 
testable  with  torquing  signals.  The  prototype  program  used  the  servo  valves  and  actuators  from  the 
F-111.  This  design  was  changed  in  the  production  aircraft  to  Include  integrated  servoactuators  with 
the  power  control  cylinder.  All  servo-actuators  in  the  aircraft  are  identical  and  the  power  actuators 
are  identical  with  the  exception  of  the  rudder  actuator.  Once  again  these  sensors  and  actuators  were 
also  subjected  to  bench  qualification  tests  to  ensure  the  fact  that  they  would  indeed  operate  reliably 
In  the  aircraft.  Once  the  hardware  design  and  development  process  has  started,  the  second  development 
phase  began. 
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SYSTEM  DEFINITION  AND  INTEGRATION 

This  phase  dealt  with  system  integration  and  system  definition.  This  phase  made  extensive  use 
of  simulation  facilities  to  accomplish  two  objectives.  First  objective  was  to  define  the  control  logic 
within  the  flight  control  system  using  an  engineering  simulation  facility,  and  the  second  objective 
was  to  integrate  the  hardware  in  the  simulator  facility  prior  to  installing  it  on  the  aircraft  to 
ensure  compatibility.  The  control  law  simulation  was  conducted  on  a fixed  base  simulator  to  define  and 
verify  the  control  logic  for  use  in  the  aircraft.  A small  visual  facility  was  made  available  for 
the  purpose  of  evaluating  handling  qualities  during  formation  and  gunnery  tasks  in  the  simulator.  The 
process  made  use  of  stability  and  control  derivatives  that  were  gathered  from  the  wind  tunnel.  Several 
advanced  control  features  were  defined  and  developed  in  the  simulator  to  enhance  handling  characteris- 
tics. The  angle-of-attack  limiter  was  defined  and  developed  in  the  fixed  base  simulator.  The  yaw 
axis  sideslip  rate  feedback  was  defined.  These  two  functions  required  concurrent  development  because 
a controlled  flight  boundary  exists  in  terras  of  angle-of-attack  and  sideslip.  The  simulator  is  the 
ideal  tool  for  this  definition  process.  The  time  constaht  in  the  pitch  rate  feedback  loop  and  the 
aileron- rudder  interconnect  logic  are  additional  parameters  that  ideal  candidates  for  simulator  develop- 
ment. In  addition  to  refining  the  handling  qualities  in  the  operational  flight  envelope,  stall/spin 
characteristics  and  spin  recovery  techniques  were  also  investigated  on  the  simulator.  The  second  purpose 
of  this  simulation  was  also  to  integrate  flight  hardware  into  the  control  law  simulator  to  ensure  that 
the  fact  that  it  did  operate  as  prescribed.  This  also  represented  a further  refinement  of  the  simulation 
for  it  included  the  effects  of  the  actual  hardware  as  it  would  perform  in  the  aircraft,  Selected 
failure  modes  were  also  inserted  and  the  built-in-test  logic  was  validated  on  this  phase  of  the  program. 
This  phase  of  the  program  replaced  the  requirement  for  an  "iron-bird"  type  flight  control  mockup 
simulation.  These  "iron-bird"  tests  were  conducted  concurrently  with  the  handling  qualities  development 
tests. 

AIRCRAFT  GROUND  TESTS 

The  third  series  of  development  tests  were  conducted  on  the  flight  control  system  installed  in  the 
airframe.  These  development  ground  tests  consisted  of  the  standard  ground  tests  conducted  with  a 
closed  loop  flight  control  system.  Limit  cycle,  structural  resonance,  frequency  response  tests,  and 
also  a series  of  electromagnetic  interference,  or  lightning  tests  were  conducted  on  the  flight  control 
system.  Because  the  electrically  implemented  fly-by-wire  flight  control  system  contains  no  direct 
mechanical  linkage  from  the  control  stick  to  the  flight  control  surface,  inmunitv  from  the  effects  of 
lightning  strikes  and  other  electromagnetic  interference  was  considered  to  be  a major  demonstration 
requirement  to  ensure  safety  of  flight.  Several  tests  were  conducted  by  increasing  the  duration  and 
amperage  of  simulated  lightning  strokes  on  the  actual  aircraft  and  measuring  the  resultant  voltages  at 
selected  points  in  the  flight  control  system.  Two  series  of  tests  were  run.  The  first  test  used  a 
maximum  current  of  260  amps  and  extrapolated  the  data  to  a maximum  strike  of  200,000  amps.  The 
maximum  induced  voltage  observed  during  this  test  ranged  from  .3  to  1.8  volts.  A second  series  of  tests 
were  conducted  using  a peak  current  of  3000  amps.  The  results  of  these  tests  were  that  maximum  transient 
induced  voltages  ranging  from  .4  to  13.8  volts  were  measured.  The  duration  of  these  voltages  were 
sufficiently  short  such  that  no  interference  with  the  system  operation  would  be  anticipated  with  a 
lightning  strike. 

Additional  compatibility  tests  that  were  performed  were  limit  cycle  and  structural  resonance  tests. 
The  purpose  of  conducting  the  structural  resonance  tests  were  to  ensure  that  no  structural  coupling 
between  the  airframe  and  system  existed.  The  purpose  of  conducting  the  limit  cycle  tests  were  to 
ensure  adequate  phase  and  gain  margin  were  maintained  within  the  flight  control  system  so  that  the 
system  did  not  go  unstable.  As  a result  of  these  tests,  some  modifications  were  made  to  the  structural 
filtering  in  the  flight  control  system.  The  requirement  to  maintain  stability  margins  with  the  large 
number  of  stores  to  be  carried  by  the  production  aircraft  necessitated  several  pitch  axis  modifications, 
such  as  altering  the  normal  acceleration  lag,  reducing  pitch  rate  gain,  reshaping  the  dynamic  pressure 
scheduled  gain  and  modifying  the  lead-lag  networks.  A structural  mode  was  encountered  in  flight  that 
was  not  discovered  during  the  ground  tests.  This  was  a coupling  of  a pitch  mode  in  a wing  tip  mounted 
missile  with  the  lateral  portion  of  the  flight  control  system  to  produce  a asymmetric  bending  mode. 

This  mode  varied  in  frequency  and  damping  with  missiles  on  or  off.  A notch  filter  in  the  lateral 
axis  was  modified  to  include  the  frequency  range  encountered  in  flight,  and  the  problem  was  eliminated. 
Once  these  ground  tests  were  successful ly  completed  the  aircraft  was  then  certified  as  being  safe  for 
initial  development  tests  flights. 

AIRCRAFT  FLIGHT  TESTS 

Two  phases  of  testing  were  accomplished  during  the  flight  test  program  (Ref  2)  to  ensure  that  the 
handling  qualities  were  optimized  for  the  air-to-air  and  air-to-ground  role  and  that  the  flight  control 
system  did  not  produce  any  adverse  effects  during  high  angle  of  attack  maneuvering  or  degrade  spin 
recovery  capability  in  the  out  of  control  flight  regime.  Before  discussing  these  it  is  rather 
interesting  to  note  that  the  first  change  during  the  development  process  that  was  made  to  the  flight 
control  system  was  that  of  reducing  lateral  sensitivity  with  the  landing  gear  down  due  to  an  inadvertent 
first  flight  that  took  place  during  a high  speed  taxi  test.  Here  it  was  determined  that  the  roll 
command  gains,  or  lateral  sensitivity  was  too  high  and  that  during  the  test  the  pilot  actually  was 
in  a position  where  he  was  forced  to  take  the  aircraft  off  to  avoid  running  off  the  side  of  runway. 

After  this  flight,  the  roll  command  gain  was  halved  with  the  gear  down.  The  development  of  the  flight 
control  system  then  proceeded  through  the  flight  test  by  optimizing  the  handling  qualities  of  the 
vehicle  using  the  air-to-air  tracking  test  technique  nr  handling  qualities  during  tracking.  In  addition 
to  gathering  stability  and  control  data,  26  data  flights  were  required  to  optimize  the  system  for  air- 
to-air  tracking.  This  was  conducted  by  using  six  pilots  to  provide  handling  qualities  data  that  gave 
both  qualitative  and  quantitative  data  on  the  aircraft  tracking  capabi 1 i ties . During  the  course  of 
this  program,  60  configuration  changes  were  made  to  the  flight  control  system  to  enhance  the  aircraft's 
caoabilities.  A summary  of  the  major  changes  made  to  the  flight  control  system  are  listed  in  Table  1. 

The  major  changes  that  were  accomplished  included  modification  of  the  roll  notch  filter  and  roll 
feedback  gains,  the  pitch  rate  feedback  washout  time  constant,  the  C-star  blend  ratio,  and  stick  force 
breakouts.  Several  evaluation  flights  were  also  conducted  with  various  limited  displacements  on  the 
aide  arm  controller.  A synopsis  of  the  tracking  optimization  process  is  shown  in  Figures  4 and  5. 
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These  plots  compare  the  F-16  with  other  current  aircraft  and  test  programs  to  show  that  the  side  arm 
controller  is  capable  of  demonstrating  excellent  tracking  capability. 
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Once  the  system  was  optimized  for  flight  within  the  controlled  flight  envelope  a high  angle  of 
attack  investigation  was  undertaken  to  ensure  that  the  flight  control  system  did  not  contribute  adversely 
to  the  spin  recovery  characteristics  of  the  aircraft.  Although  the  flight  control  system  does  contain 
an  angle  of  attack  limiter,  the  angle  of  attack  limit  was  exceeded  on  several  occasions  during  the 
flight  test  program.  These  out-of-control  situations  occurred  at  very  low  airspeeds  accompanied  with 
high  angular  rates  or  when  abnormal  control  inputs  were  used  to  maneuver  the  aircraft.  The  aircraft 
did  enter  a spin  while  the  pilot  attempted  to  roll  the  aircraft  with  rudder  only.  Recovery  was 
accomplished  with  pilot  inputs.  As  a result  of  exceeding  these  limitations,  several  modifications 
to  the  basic  flight  control  system  were  proposed.  The  first  one  was  to  schedule  pitch  acceleration 
feedback  with  increasing  angle  of  attack  to  enhance  the  controllability  at  low  airspeeds.  The  second 
modification  that  was  recommended  was  to  reduce  the  amount  of  commended  roll  rate  as  airspeed  decreased 
to  reduce  the  magnitude  of  the  inertial  coupling  term. 

PRESENT  STATUS 

The  prototype  (YF-16)  program  has  been  completed.  One  of  the  prototype  aircraft  has  been  fitted 
with  vertical  canards  and  is  currently  under  test  as  a control  configured  vehicle.  The  other  prototype 
aircraft  is  conducting  follow-on  avionics  testing  in  support  of  the  F-16  Full  Scale  Development  (FSD) 
program. 

The  first  FSD  aircraft  is  about  to  be  rolled  out  and  has  successfully  completed  ground  testing. 
CONCLUSIONS 

The  procedures  used  to  develop  the  F-16  fly-by-wire  flight  control  systems  are  less  time  consuming 
and  costly  and  allow  for  more  design  flexibility  as  compared  to  those  procedures  required  to  develop 
a mechanical  augmented  “conventional"  system.  Hardware  and  control  logic  can  be  developed  concurrently, 
hereby  reducing  the  complexity  of  hardware  simulations  and  eliminating  the  requirement  for  an  "iron- 
bird".  The  design  benefits  of  full  authority  fly-by-wire  to  exploit  active  control  technology  are 
obvious.  A developmental  benefit  is  that  afforded  by  the  ease  by  which  changes  can  be  implemented  via 
ground  or  in-flight  system  reconfiguration.  This  was  demonstrated  in  the  flight  test  program.  It 
is  anticipated  that  the  production  aircraft  development  cycle  can  be  completed  as  effectively  as  the 
prototype  program  was. 
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TABLE  1 

CONFIGURATION  CHANGES 

1.  Modified  roll  rate  command  to  allow  0.5  gain  selection  for  reduced  lateral  sensitivity. 

Z.  Rewired  FCS  to  permit  speed  brake  operation  in  flight  with  landing  gear  extended  for  better  SFO 
pattern  control. 

3.  Rewired  trim  panel  to  produce  50  percent  less  roll  trim  rate  to  reduce  roll  trim  sensitivity. 

4.  Increased  rudder  feel  spring  breakout  force  from  10  lb  to  22  lb  to  reduce  sensitivity. 

5.  Modified  FCS  to  correct  FCS/structural  coupling  and  pitch  sensitivity  during  tracking.  Changes 
included  a roll  notch  filter,  revised  roll  gains,  and  an  increase  of  the  pitch  rate  wash-out  time 
constant  from  1.0  to  2.25  seconds. 

6.  Pitch  rate  wash-out  time  constant  lowered  from  2.25  to  1.5  seconds.  A lead  term  in  leading  edge 
flaps  commanded  was  added  to  improve  performance. 

7.  Installed  300-degree-per-second  roll  rate  gyro  to  improve  FCS  lateral  axis. 

8.  Reworked  FCS  panel  to  improve  FCS  self-test  and  changed  the  pitch  rate  wash-out  time  constant  from 
1.5  to  1.0. 

9.  Decreased  n^/6  gain  from  5;1  to  2.5:1  and  lowered  alpha  limiter  from  28  to  26.7  degrees. 

10.  Stiffened  the  side  stick  by  increasing  roll  force  limits  to  + 15  pounds  and  increasing  pitch  output 
to  30  pounds. 

11.  Changed  pitch  sensitivity  by  adjusting  F^  prefilter  time  constant  from  1/4  to  1/2.27.® 

12.  Replaced  fixed  stick  controller  with  linear  pitch  displacement  stick. 

13.  Installed  linear  pitch  displacement  stick. 

14.  Removed  displacement  stick  and  installed  force  stick  controller. 

15.  Eight-radian  lag  installed  in  pitch  axis  to  reduce  high-gain  task  pitch  sensitivity. 

16.  Four-radian  lag  reinstalled  in  pitch  axis. 

17.  Revised  displacement  stick  to  reduce  travel  by  50  percent. 

18.  Installed  FCS  computer  with  8-radian  lag  in  pitch  channel. 

19.  3.6-radian  lag  in  roll  filter  installed  in  FCS. 

20.  Eight-radian  lag  in  pitch  channel  reinstalled. 


h«U 

|SI 

1 

1 ** 

1 ' ' 

>8 

nM 

1 ■'^•9 

L u c.:aan 

inNHan^^l 

kI|« 

IBM 

HOH 

^1 

p]| 

i 

IS 

iHl 

:o-i 


JA-37  DIGITAL  AUTOMATIC  FLIGHT  CONTROL  SYST  LM  (DA  ICS) 
SELF-TEST  DEVELOPMENT 

by 
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Government  and  Aeronautical  Products  Division 
Honeywell  Inc. 

1625  Zarthan  Ave. 

St.  Louis  Park,  Minn.  55416 

and 

Kjell  Folkesson 
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SUMMARY 

SAAB-SCANTA  JA-37  \ iggen  Interceptor,  Figure  1,  which  enters  service  with  the  Swedish  Air 
horce  in  1978,  will  be  equipped  with  the  world's  first  production  Digital  Automatic  Flight  Control  System 
(DAFCS).  The  DAFCS  was  designed  and  developed  by  Honeywell's  Government  and  Aeronautical  Products 
Division  under  contract  with  Sweden's  Air  Materiel  Department  (FMV).  SAAB-SCAMA  was  largely 
responsible  for  development  of  detailed  system  requirements  and  integration  of  the  DAFCS  with  other 
aircraft  systems. 


Figure  1.  SAAB-SCANTA  JA-37  \ iggen  Interceptor 


The  DAFCS  is  currently  in  the  final  stages  of  flight  test  development,  with  series  production  hard- 
ware deliveries  scheduled  to  begin  in  1977.  Following  is  a description  of  DAFCS  self-test  deielopment 
which  produced  flight  line  and  in-flight  self-test  capability  consistent  with  demanding  flight  safety  and 
built-in  test  effectiveness  requirements. 

Self-test  development  included  verification  of  self-test  effectiveness  to  a degree  sufficient  to  -- 

• Support  the  decision  that  a single  processor  DAFCS  mechanization  could  pn  .ide  adequate 
m-flight  fail  safety,  and  that  mechanization  of  a redundant  second  processor  wa.«  urtn(  (')’f:x,-irv 
in  the  production  JA-37  DAI  CS,  and  to  -- 

• Confirm  that  the  DAFCS  preflight  self  test  was  in  compliance  with  maintenance  performance 
•est 'fault  lci_alization  requirements. 

- )e  p,.iti  in-flight  fail  safety  was  verified  by  demonstrating  that  the  in-flight  monitoring  function 
• f o ,.|y  ,P  catastrophic  ' failure  modes.  ("  Potent  laliv  catastrophic 

. ,•  1.  , >re  those  that  produce  transients  in  excess  of  specified  limits.  1 Effeclivenes's  in  detecting 

t'rophie  failure  modes  was  demonstrated  to  be  well  above  99  percent.  Overall  fault 
' ' i ting  potentially  catastrophic  as  well  as  non-catastrophic  failure  modes,  was  closer  to 

- > . ffe.  tivcness  of  the  in-flight  monitoring  function  was  demonstrated  to  be  compatible 
I « 10*6  calastfofihic  failure  probability  for  a 9Q-minute  mission. 


»n  that  demonstrated  these  results  is  described  in  the  following  paragraphs. 
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ABBREVIATIONS 

DAFCS  - Digital  Automatic  Flight  Control  System 
FMV  - Swedish  Air  Materiel  Department 

CAS  - Control  Augmentation  System 

CPU  - Central  Processor  Unit;  The  Honeywell  HDC-301,  a single-card  digital  processor 

WDT  - Watchdog  Timer 

LSIC  - Large-Scale  Integrated  Circuit 

I/O  - Input /Output 

A/D  - Analog  to  Digital 

D/A  - Digital  to  Analog 

DCM  - Dynamic  Computation  Monitor 

FMEA  - Failure  Modes  and  Effects  Analysis 

FMET  - Failure  Modes  and  Effects  Testing 

P(^p  - Probability  of  Catastrophic  Failure 

BIT  - Built-In  Test 

LRU  - Line  Replaceable  Unit 


DAFCS  DESCRIPTION 
DAFCS  Functional  Configuration 

The  Digital  Automatic  Flight  Control  System  (DAFCS),  Figure  2.  is  a high-authority,  fail-safe, 
single  processor  digital  flight  control  system  that  provides  the  following  functions: 

• Control  Augmentation  System  (CAS),  Normal  Mode 

Stability  Augmentation 
- Transonic  Trim  Change  Compensation 

• Control  Augmentation  System,  Aiming  Mode 

• Attitude  Hold 

Pitch  Attitude  Hold 
Roll  Attitude  Hold 
Heading  Hold 
Control  Stick  Steering 

• Altitude  Hold 

• Automatic  Airspeed  Control 


MODE  DISCRETES 


AIRCRAFT  digital  SUBSYSTEMS 


Figure  2.  .IA-37  DAFCS  Mechanization 
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The  DAFCS  is  mechanized  with  inputs  from  dual  analog  sensors  for  the  control  augmentation  func- 
tion. and  with  single  serial  digital  reference  signals  for  attitude  control  modes.  The  single  HDC-301 
digital  central  processor  unit  (CPU)  performs  control  law  computation,  monitoring,  and  test  functions, 
and  outputs  command  signals  to  the  single  control  surface  servos. 

DAFCS  Self-Test  Configuration 

The  JA-37  DAFCS  self  test  is  designed  to  provide  in-flight  fait  safety  as  well  as  performance  test/ 
fault  localization  to  facilitate  system  maintenance.  As  shown  in  Figure  3,  the  DAFCS  self  test  is  an 
integral  part  of  the  JA-37  system  test  concept,  which  includes: 

• In-flight  Monitoring 

• Flight  Line  Test 

Power-Up  Test/First  Line  Check 
- Internal  Performance  Test /Internal  Fault  Localization 

• Depot  Level  Test 

Performance  Test 

Fault  Localization  to  Shop  Replaceable  Unit 


TEST  ORGANIZATION 


IN-FLIGHT  MONITORING 


FLIGHT  LINE  TEST  (BIT) 

• POWER  UP  TEST 

• FIRST  LINE  CHECK  (PREFLIGHT 

TEST,  TIME  AVAILABLE) 

• INTERNAL  PERFORMANCE  TEST/ 

INTERNAL  FAULT  LOCALIZATION 
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DEPOT  LEVEL  TEST 
• PERFORMANCE  TEST/FAULT 
LMALI2ATI0N  TO  SHOP  REPi  ACEABLE 


Figure  3.  JA-37  DAFCS  Self-Test  Organization 


DAFCS  in-flight  monitoring  and  flight  line  testing  are  accomplished  via  built-in  tests  performed 
by  the  DAFCS  digital  computer.  For  depot  level  testing,  built-in  test  is  augmented  by  external  automatic 
test  equipment  for  fault  localization  to  shop  replaceable  unit  (e,  g. , printed  circuit  board). 

DAFCS  Self-Test  Requirements  --  DAFCS  self-test  requirements  can  be  summarized  as  follows: 

• In-flight  Monitoring,  Faiilt  Detection  - Probability  of  undetected  failure  resulting  in 
"potentially  catastrophic"  failure  shall  be  less  than  1x10'°  during  a 90-minute  mission, 
"Potentially  catastrophic"  failure  is  defined  as: 

- Normal  Acceleration,  > 2 g incremental 

Lateral  Acceleration,  > 0.  5 g 

Other  parameters 

- In-flight  Monitoring  Nuisance  Disengage  - Probability  of  nuisance  disengage  shall  be 
less  than  1 x lO"*’  during  a 90-minute  mission. 

• Flight  Line  Performance  Test  - One  hundred  percent  check  of  in-flight  monitoring  and 
disengage  mechanism^  required.  Overall  fault  detection  probability  must  be  greater  than 
95  percent,  where  a "fault"  results  in  performance  outside  functional  requirements. 

• Flight  Line  Fault  Isolation  - For  detected  faults,  fault  localized  to  line  replaceable  unit 
(e.g. , a black  box)  with  95  percent  confidence. 

• Depot  Level  Teat  - ATE  requirements  only. 


DAFCS  Self-Test  Mechanization,  In-flight  Monitoring  --  DAFCS  in-flight  monitoring.  Figure  4, 
must  be  fast  and  comprehensive  to  meet  requirements  because  the  DAFCS  authority  is  potentially 
destructive;  i.e.,  authority  in  the  low-altitude,  high-speed  flight  regime  is  approximately  10  g.  Hard- 
over  failure  will  produce  transients  in  excess  of  requirements  if  left  undetected  for  more  than  0.  1 second 
at  high-speed  flight  conditions.  DAFCS  in-flight  monitoring  can  be  summarized  as  follows: 

• Sensor  Monitoring 

Comparison  monitoring  of  feedback  signals  from  dual-redundant  inner-loop  CAS  sensors 
(rate  gyro  packages,  normal  and  lateral  accelerometer  packages,  stick  force  and  stick 
position  transducers). 

Parity  and  format  checks  on  digital  inputs  from  nonredundant  outer  loop  sensors,  and 
software  limiting  of  outer  loop  sensor  signals. 

Monitoring  of  critical  gains  using  an  independent  air  data  source  to  protect  against  air 
data  computer  failures. 

• CAS  Servo  Monitoring 

Comparison  of  actual  performance  against  digital  models. 

• Input  Circuitry  Check 

Accomplished  by  software  comparison  of  A/D  converted  power  supply  references  to 
known  constants, 

• HDC-301  CPU  Monitoring 

Self  test  - Exercises  all  instructions  and  control  critical  to  flight  safety. 

- Dynamic  Computation  Monitoring  (DCM)  - A continuous  check  of  critical  HDC-301  CPU 
and  I/O  functions  by  CPU  dynamic  control  of  an  external  independent  analog  element. 

CPU  failures  affecting  CPU  dynamic  computation  capability  will  trip  this  monitor. 

Watchdop  Timer  (WDT)  - Verifies  that  the  computation  cycle  is  completed  within  a 
specified  time  limit. 

Real  Time  Clock  Monitor  - Assures  that  the  computation  period  has  not  exceeded  a 
nominal  value. 

• Memory  Monitoring 

Parity  checks  on  all  words  accessed  from  memory. 

Periodic  sum  checks  of  critical  instruction,  constant,  and  scratchpad  locations. 

Continuity  monitoring  to  assure  proper  program  flow  through  critical  instructions. 


i'igurr  4.  .IA-37  DAFCS  In-Flight  Monitoring 


DAFCS  Self-Test  Mechanization,  Flight  Fine  Testing  - The  DAFCS  flight  line  test  includes  the 
following: 

• I’rocessor  Self  Test 

• Memory  Test 

- I’arity  Checker  Teat 


Memory  Sum  Check 
Memory  Addressing  Test 
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• A/D,  D/A  Converter  Test 

Calibration  test  via  wrap-around  testing 

• HDC-301  CPU  Monitor  Test 

Test  of  Watchdop  Timer  for  proper  time-out  period. 

Simulation  of  CPU  failure  to  check  for  proper  DCM  failure  indication. 

• Disengage  Circuitry  Test 

Attempt  engagement  with  WDT,  DCM  failed.  Verify  no  engagement. 

• Power  Supply  Reference  Test 

• Input/Output  Signal  Conditioning  Test 

Stimulate  all  input  electronics  and  measure  dynamic  response. 

Check  output  electronics  via  wrap-around  testing. 

• Sensor  Tests 

Stimulate  sensors,  measure  dynamic  response. 

Interrogate  digital  sensor  data  valid  signals. 

• Servo  Tests 

Exercise  all  servos  to  measure  dynamics,  rate  and  position  limits,  thresholds,  etc. 

A power-up  test  verifies  DAFCS  in-flight  monitoring  capability  by  using  a sequence  of  the  described 
resident  tests  automatically  at  power  on.  A first  line  check,  which  is  also  resident  in  DAFCS  memory, 
is  a sequence  of  the  described  tests  initiated  by  pilot  command  to  determine  the  system's  operational 
status.  An  internal  performance  test/internal  fault  localization  test  program  employs  the  described  tests 
expanded  for  fault  localization  as  loaded  into  DAFCS  memory  from  a portable  tape  unit  and  executed  via 
a cockpit  control  panel.  Measurement  values  and  test  position  numbers  are  sent  to  the  aircraft  central 
computer  for  recording. 

DAFCS  SELF-TEST  DEVELOPMENT 

Following  is  a description  of  the  various  tasks  included  in  the  definition,  analysis,  and  verification 
of  DAFCS  in-flight  monitoring  and  flight  line  test  functions,  as  summarized  in  Figure  5,  JA-37  DAFCS 
Self-Test  Development. 

In-Flight  Monitoring  Development 

DAFCS  in-flight  monitoring  development  was  initiated  prior  to  the  formal  JA-37  Development  Pro- 
gram contract  award  in  May  1973.  Various  monitoring  concepts  had  been  defined  and  analyzed,  and  a 
prototype  Honeywell  digital  flight  controller  had  been  flight  tested  in  a SAAB-SCANIA  AJ-37  aircraft. 
Pre-award  cost  tradeoff  studies  had  indicated  that  the  DAFCS  should  be  mechanized  as  a single  processor 
digital  configuration. 

The  contracted  JA-37  DAFCS  Development  Program  was  aimed  at  systematic,  orderly  development 
of  such  a single  processor  digital  configuration.  The  program  included  prototype  and  preproduction 
hardware  development  and  test  phases.  The  prototype  hardware  intended  for  flight  test  in  a JA-37  air- 
craft (modified  AJ-37  aircraft)  was  to  be  mechanized  as  a dual  processor  configuration.  Thus  the 
dualized  prototype  hardware  could  be  undergoing  functional  flight  test  development  while  the  critical 
single  processor  monitoring  functions  were  developed  and  verified  in  preparation  for  single  processor 
preproduction  system  flight  test  in  aircraft  JA-37-8  (first  JA-37  preproduction  air  frame).  Preproduction 
hardware  was  designed  to  allow  optional  dual  CPU  mechanization,  but  this  option  was  not  exercised  as 
the  safety  of  the  single  processor  configuration  was  satisfactorily  verified. 

The  actual  development  program  schedule  is  presented  in  Figure  5.  Key  decision  dates  in  this 
in-flight  monitoring  development  program  included: 

• .TA-37-21  First  Flight  Approval,  June  4,  1974  - Decision  was  made  that  the  flight  test  proto- 
type  safety  configuration,  with  dual  processors,  had  been  sufficiently  verified  by  analysis 
and  testing  to  allow  the  37-21  DAFCS  to  fly. 

• .Safety  Configuration  Review,  November  21,  1974  - Safety  specification  compliance  was 
established.  This  was  based  on  the  calculations  of  catastrophic  failure  rate  using  agreed 
failure  rate  data  and  agreed  mission  time,  failure  transient  magnitude  data  from  simulation 
and  flight  test,  and  calculated  nuisance  disengage  probability. 

• JA-37-8  First  Flight  Approval.  October  1.  1975  - Decision  was  made  that  the  single  processor 
safety  configuration  had  been  sufficiently  verified  by  analysis,  ground  testing,  and  flight  testing 
to  allow  the  single  processor  37-8  DAFCS  to  fly. 

The  safety  configuration  development  involved  a preliminary  round  of  verification  activity  prior  to 
prototype  hardware  flight  test  in  the  JA-37-21  aircraft.  This  activity  included  functional  level  failure 
mode  effects  analysis  (FMEA),  prototype  hardware  safety  studies,  and  rig  studies  verifying  the  ability 
of  the  preproduction  safety  configuration  to  detect  virtually  all  potentially  catastrophic  failure  modes. 

The  preproduction  safety  configuration  verification  process  required; 


; 


A 
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Kigure  5.  .IA-37  DAFCS  Self-Test  Development 


1.  Analysis  of  integrated  circuit  failure  modes  at  the  gate  level  to  characterize  faults  in  terms 
of  stuck  at"  output  pins. 

2.  Simulation  with  actual  hardware  of  these  "stuck  at"  failure  modes  where  feasible,  by  opening 
or  shorting  integrated  circuit  output  pins  and  verifying  detection  by  DAFCS  self-test  software. 
Any  remaining  ' stuck  at"  output  pin  faults  were  analyzed  for  detectability. 

3.  Documentation  of  effects  of  all  failures.  For  undetected  failures,  noncatastrophic  effects  on 
aircraft  were  confirmed  via  simulation  studies. 

Key  tasks  in  this  safety  configuration  development  are  described  in  the  following  paragraphs. 

Safety  -System  Definition  --  Monitor  functions  were  defined  to  detect  conceivable  faults  in  all  the 
DAFCS  hardware  components.  Computation  rates  for  monitoring  functions  were  established  based  on 
fault  reaction  time  requirements,  and  the  number  of  fault  indications  required  prior  to  fault  reaction 
to  minimize  nuisance  disengage  probability.  Monitoring  of  analog  devices  (e.  g. , dual  sensor  comparison 
monitoring,  servo  to  servo  model  comparison  monitoring)  required  at  least  three  successive  miscompares 
prior  to  fault  reaction  (disengagement).  Digital  hardware  monitoring  functions  (e.  g. , CPU  self  test, 
memory  sum  check)  activate  fault  reaction  after  a single  fault  indication. 

The  monitor  function  computation  rate  was  set  sufficiently  high  such  that  time  required  for  fault 
identification  (e.g. , three  computation  cycles  for  analog  devices)  plus  time  required  for  fault  reaction 
(e.g.  , recentering  servo  actuators)  was  short  enough  to  restrict  failure  transients  to  allowable  limits. 

Hardware  and  software  were  developed  to  provide  a prototype  safety  configuration  that  consisted  of 
a single  processor  DAFCS  mechanization  with  a redundant  processor  used  to  duplicate  the  CAS  comput- 
tations  and  provide  additional  safety  during  the  early  developmental  flying  in  the  37-21  aircraft.  Special 
emphasis  was  placed  on  making  the  mechanization  safe  without  the  second  processor  such  that  the  second 
processor  could  be  eliminated  with  confidence  after  early  test  flying  in  the  37-21  aircraft. 

Functional  Level  FMEA  --  The  functional  level  FMEA  included  analysis  of  each  functional  block  of 
the  DAFCS  to  determine  failure  modes  and  the  corresponding  failure  effects.  Detectability  of  each  failure 
mode  by  in-flight  monitoring  was  determined.  The  functional  level  FMEA  was  the  first  round  of  analyses 
of  monitoring  configuration  development.  The  functional  level  FMEA  was  followed  by  detailed  level 
FMEA,  which  served  to  verify  the  fully  developed  system  design. 

The  detailed  level  FMEA  consisted  of  subdividing  the  DAFCS  into  56  functional  blocks  and  identify- 
ing all  failure  modes  associated  with  each  block.  Of  the  343  failure  modes  identified,  126  were  deter- 
mined "potentially  critical",  and  these  became  the  focal  point  for  further  investigation.  Where  there  was 
a reasonable  chance  that  a system  safety  or  reliability  improvement  could  be  identified  by  going  to  a lower 
level  or  where  a specific  portion  of  the  FMEA  was  identified  as  necessary  for  a design  decision,  the 
functional  level  FMEA  was  expanded. 

Prototype  Safety  Studies  --  Safety  studies  were  designed  to  identify  DAFCS  failures  with  the  poten- 
tial  to  produce  most  significant  aircraft  transients.  These  failures  were  further  evaluated  during  the 
"rig"  studies,  described  later.  Prototype  hardware  was  employed  to  evaluate  the  ability  of  the  in-flight 
monitoring  functions  to  detect  a broad  set  of  simulated  hardware  failures  and  cause  system  disengage- 
ment consistent  with  flight  safety  requirements.  The  DAFCS  unit  was  programmed  with  the  flight  soft- 
ware, which  included  the  in-flight  monitoring  functions.  Failures  were  simulated,  and  the  in-flight  moni- 
toring reaction  to  these  failures  was  observed.  The  following  monitoring  functions  were  examined  during 
this  study: 


• Dual  Element  Monitoring 

• I/O  Monitoring 

• Memory  Monitoring  (parity,  sum  check,  etc) 

• CPU  Monitoring  (self-test  routine,  dynamic  computation  monitor) 

• Engage/Disengage  Circuitry 

Hardware  failures  were  simulated  during  this  study  by  using  a special  card  extender  containing 
switches  which  opened  card  connector  pins  to  simulate  failures.  Memory  and  CPU  failures  were  simu- 
lated by  modification  of  the  operational  software. 

The  output  data  from  the  study  defined  the  time  interval  between  failure  introduction  and  the  discrete 
output  command  tor  CAS  servo  disengagement  for  the  various  failure  modes  simulated.  This  failure 
detection  time  interval  was  compared  to  allowable  detection  time  (known  detection  time  which  resulted  in 
excessive  failure  transients)  to  determine  adequacy  of  the  monitoring  functions. 

Rig  Studies  --  Studies  were  conducted  to  verify  the  DAFCS  dual  CPU  prototype  in-flight  monitoring 
configuration  in  preparation  for  flight  test  in  aircraft  ,1.4-37-21.  These  studies  utilized  the  SAAB 
SCANIA  .lA-37  "rig'  , consisting  of  an  aircraft  mechanical  model,  a six-degree-of-freedom  digital  simu- 
lation of  the  aircraft  dynamics,  and  actual  prototype  DAFCS  electronics. 

The  purpose  of  the  rig  study  was  to  evaluate  the  safety  configuration  m a realistic  situation  to  gain 
confidence  before  going  to  the  aircraft.  These  studies  investigated  most  severe  failures  identified  in  the 
prototype  safety  study  and  concentrated  on  sensor  and  servo  monitors.  Servo  monitoring  optimization 
was  also  performed  using  actual  .IA-37  mechanical  system  hardware. 

Prototype  DAFCS  hardware  was  installed  in  the  rig  and  checked  out  to  ensure  that  it  performed  as 
designed  with  respect  to  both  hardware  and  software.  The  system  was  then  exercised  to  investigate  the 
risk  for  nuisance  disengagements.  Simulated  failures  were  then  introduced  into  the  servos  and  sensors 
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to  evaluate  the  performance  of  the  applicable  monitor  and  the  resulting  failure  transient.  The  effect  of 
simulated  failures  in  the  outer  loops  was  also  evaluated. 

JA-37-21  Flight  Test  --  In  addition  to  functional  performance  evaluation,  JA-37-21  flight  test 
included  specific  safety  tests  to  -- 

• \'erify  nonsusceptibility  to  nuisance  disengagement,  and 

• Verify  that  representative  in-flight  failure  transients  corresponded  to  those  obtained  in  simu- 
lation studies,  thus  validating  the  extensive  simulation  study  results. 

Hardware  Tolerance  Analysis  --  DA  PCS  hardware  tolerances  were  computed  from  component 
tolerances  for  use  in  monitoring  nuisance  disengage  analysis  and  for  use  in  establishing  preflight  test 
tolerances. 

Nuisance  Disengage  Probability  Analysis  --  As  a basis  for  this  activity,  nuisance  disengagements 
are  defined  as  disengagements  that  occur  with  all  system  elements  working  as  designed;  i.  e.  , perform- 
ing within  specifications  and  in  the  specified  environment  for  the  device.  When  a device  such  as  a rate 
gyro  or  servo  causes  a disengagement  as  a result  of  operation  outside  specified  characteristics,  this  is 
a failure  and  not  a nuisance  disengagement.  Within  a digital  system  there  are  certain  elements  that  do 
not  have  performance  variations  even  with  specified  variations  in  such  things  as  power  supplies,  temper- 
ature, etc.  DAFCS  monitoring  functions  which  monitor  these  elements  include  the  CPH  self  test, 
memory  sum  check,  and  memory  parity  error  detector.  Nuisance  disengage  probability  associated  with 
these  monitoring  elements  should  be  zero,  and  this  was  confirmed  by  consideration  of  the  design  margins 
of  the  electronics  involved. 

Based  on  these  ground  rules,  the  nuisance  disengagement  analysis  considered  tolerances  m the 
following  devices: 

• Dual  Sensors 

• Servos 

• DAFCS  Electronics 

The  analysis  identified  necessary  changes  to  monitoring  thresholds.  With  changes  implemented 
as  needed  to  satisfy  nuisance  disengage  requirements,  quantitative  probability  of  nuisance  disengagement 
associated  with  each  of  the  stated  devices  was  computed.  The  resulting  DAFCS  total  nuisance  disengage- 
ment probability,  the  sum  of  the  individual  parts,  was  then  computed  to  be  within  specified  limits. 

Detailed  Level  FMEA  --  DAFCS  functional  blocks  containing  potentially  critical  failure  modes 
were  examined  at  the  piece-part  level.  Excluding  the  CPU,  whose  FMEA  is  discussed  below,  more 
than  3,  700  piece-part  failure  modes  were  identified,  and  their  probability  of  failure  and  failure  detect- 
ability was  investigaged. 

HDC-301  CPU  FMEA  --  The  HDC-301  CPU  contains  15  LSIC's  of  10  different  types  plus  more 
standardized  circuitry  for  the  interface  functions.  Each  LSIC  has  42  input/output  pins.  This  study 
examined  the  effects  that  failure  of  the  functional  logic  elements  contained  in  each  LSIC  would  have  at 
the  output  pins,  the  impact  of  these  failures  on  overall  CPU  operation,  if  the  failure  effect  would  be 
detected  by  the  CPU  self-test  program,  and  how  an  undetected  failure  would  affect  the  DAFCS  operation. 

The  CPU  FMEA  was  initiated  with  analysis  of  LSIC  at  the  gate  level  to  characterize  all  gate  fail- 
- ^ total  of  2158  LSIC  "stuck  at"  failure  modes  were 


ures  in  terms  of  LSIC  output  pin  stuck  at  faults.  A total  ot  2158  LSlL  stuck  at  lailure  modes  were 
defined  to  encompass  all  single  gate  failures,  each  involving  one,  two,  three  or  more  LSIC  output  pins. 

Ali  LSIC  "stuck  at"  failure  modes  involving  up  to  three  pins  were  simulated  with  actual  hardware, 
and  detectability  by  actual  CPU  self-test  software  was  verified  (see  next  paragraph).  LSIC  failure  modes 
affecting  four  or  more  pins  (666  modes)  were  analyzed  for  detectability  by  CPU  self-test  software. 

HDC-301  CPU  Failure  Mode  Effects  Testing  (FMET)  --  An  actual  DAFCS  electronic  unit  and  a 
production  HDC-301  were  used  to  simulate  failures  in  the  CPU  to  determine  the  effectiveness  of  monitor 
functions  in  detecting  these  failures.  Failures  within  the  LSIC's  were  simulated  by  lifting  the  leads  and 
introducing  open  or  hardover  failures  on  the  output  pins.  The  leads  were  lifted  one,  two,  and  three  at  a 
time  and  combinations  of  possible  failure  induced  output  conditions  were  evaluated.  More  than  2,  000 
LSIC  failure  modes  were  simulated  in  this  manner.  In  addition,  another  300  failure  modes  were  evaluated 
by  placing  the  CPU  on  an  extender  such  that  the  connector  pins  could  be  opened  for  the  insertion  of  failure 
simulations  to  complete  the  testing  of  possible  CPU  failure  modes. 

In-flight  Monitoring  Effectiveness  Verification  --  Effectiveness  of  the  .TA-37  DAFCS  single  processor 
preproduction  monitoring  configuration  was  verified  by  simulating  the  vehicle's  reaction  to  detected  and 
undetected  failures.  To  accomplish  this,  an  actual  DAFCS  electronics  unit  was  tied  into  an  analog  simu- 
lation of  the  vehicle  dynamics,  the  sensors,  and  the  servos;  and  the  flight  control  software  and  all 
monitoring  functions  were  loaded  into  the  DAFCS  core  memory.  System  failures  were  simulated  using 
the  extended  card  and  open  pin  technique  and  by  lifting  components  on  the  printed  circuit  boards,  by  injec- 
tion via  the  analog  simulation,  and  through  the  use  of  special  software  modifications.  Aircraft  failure 
transients  were  observed  on  the  simulation  and  verified  to  be  within  specified  limits. 

■Safety  Analysis  --  The  purpose  of  the  safety  study  was  to  analyze  all  identified  functional  failure 
modes  of  the  DAFCS  to  provide  a basis  for  calculating  the  probability  of  catastrophic  failure  <PcF>* 
analysis  identified  critical  failure  modes  for  each  function  (i.  e. , those  failures  modes  that  are  potentially 
catastrophic),  determined  the  detectability  of  these  modes  by  in-flight  monitors  and  ground  test,  calcu- 
lated the  probability  of  the  mode  occurring,  and  determined  the  model  by  which  to  calculate  P(;;i.-. 
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All  identified  critical  failures  fall  into  one  of  three  classes: 

• Detectable  in  flight  and  requiring  safe  disengagement  of  the  DAFCS. 

• Defeating  the  monitor  and/or  disengage  function  such  that  safe  disengagement  cannot  occur, 
if  required. 

• Undetectable  in  flight  and  potentially  catastrophic. 

These  three  classes  of  failure  are  represented  in  the  safety  calculation  model  employed  in  the  DAFCS 
safety  analysis. 

The  safety  analysis  was  accomplished  by  performing  the  following  tasks,  which  are  shown  in  flow 
chart  form  in  Figure  6. 

1.  Identify  functions  and  determine  the  safety  criticality  of  each,  using  a prototype  safety  study, 
rig  studies,  and  analysis. 

2.  Determine  the  failure  modes  for  all  safety  critical  functions.  These  were  initially  identified 
from  the  functional  level  analysis  and  revised  based  upon  the  detailed  piece-part  analysis. 

3.  Determine  those  functional  level  failure  modes  that  are  safety  critical;  i.  e. , could  cause  a 
catastrophic  failure  if  undetected,  could  inhibit  detection  of  a critical  failure,  or  prevent 
disengagement. 

4.  Determine  detectability  of  all  safety  critical  functional  failure  modes.  This  was  initially  done 
analytically  and  subsequently  verified  by  monitoring  effectiveness  verification  and  HDC-301 
failure  modes  effects  testing.  Results  were  fed  back  to  improve  detection  by  making  appro- 
priate changes  to  the  system  mechanization,  the  monitoring  mechanization,  and/or  preflight 
test  methods  and  procedures. 

5.  For  each  critical  failure  mode,  the  probability  of  occurrence  was  calculated  using  parts  count, 
failure  modes,  failure  rates,  and  hazard  times.  Again,  the  results  were  used  to  determine 
whether  system  changes  were  required  to  meet  the  safety  requirements. 

6.  The  critical  failure  probabilities  were  summed  according  to  the  appropriate  block  of  the 
safety  model  to  which  they  belonged,  and  the  final  value  of  PcF  was  calculated. 

7.  Analysis  of  CPU  failure  modes  was  handled  somewhat  differently  as  this  analysis  was  carried 
down  to  the  individual  functional  element  level  within  each  of  the  15  LSIC  chips  which  constitute 
the  CPU.  The  failure  modes  of  each  functional  element  were  analyzed,  and  an  assessment  was 
made  as  to  the  criticality  and  detectability  of  each  mode.  Several  tests  were  added  to  the  flight 
program  and  preflight  test  program  to  detect  all  identified  failure  modes. 


MONITOR 

SPECIFICATION 


PREFLICHT  TEST 
SPECIFICATION 


OAFCS  SYSTEM 
OtPiNtTlON 


FAIL 

SUMUlATtON 
RIG  SAFETY 
yTUOUS 


DETERMINE  SAFETY 
CRITICAL  FUNCTIONS 


noncritical 

FUNCTIONS 


ANALYSIS,  F AILURE 
MOOES  EFFECT  TEST. 
MONITOR  EFFECTIVENESS 
VERIFICATION 


DETERMINE 
CRITICAL  FUNCTIONAL 
FAILURE  MOPES 


NONCRITICAL 
FAILURE  MOOES 


FEEDBACK  TO 

• SYSTEM  DEFINITION 

• MONITOR  SPEC, 

• PREFLIGHT  TEST  SPEC 


determine 

detectability 


CALCULATE 
PROBABILITY  OF 
CATASTROPHIC 
FAILURE 


COMPUTATION 

MODEL 


CATASTROPHIC 

FAILURE 

PROBABILITY 


RELIABILITY 

DETERMINE 

FAILURE  RATES 

FnNF  TfONAl 

1-  -* 

DETERMINE 

DETAIL  FMI  A 

FAILURE  MODES 

:o-io 


Probability  of  catastrophic  failures  was  thus  calculated  to  be  0.232  x 10“®  for  a 1.5-hour  mission. 
Preflight  BIT  Development 

Preflight  BIT  development  was  an  evolutionary  process  that  resulted  in  a self-test  configuration 
verified  to  comply  with  specified  requirements.  Important  milestones  in  the  preflight  test  development 
included  these: 

• BIT  Approved  for  Flight  Test,  May  1975  - The  decision  was  made  that  BIT  verification  results 
to  date  justified  reliance  on  BIT  for  preflight  testing;  i.  e. , comprehensive  manual  preflight 
testing  could  be  eliminated  in  the  .IA-37-21  flight  test  program. 

• Preflight  BIT  Approval.  1976  - Verification  results  approved,  redesign  actions  essentially 
complete. 

Following  is  a brief  description  of  key  activities  associated  with  JA-37  DAFCS  preflight  BIT 
development. 


Preflight  BIT  Configuration  Definition  --  PrefUght  BIT  test  methods  were  devised  to  facilitate  per- 
formance  test  and  fault  localization  to  line  replaceable  unit  without  the  use  of  external  test  equipment. 
Test  tolerances  were  determined  such  that  passage  of  tests  confirmed  system  performance  consistent 
with  aircraft  mission  requirements.  Test  methods  were  documented  in  a comprehensive  DAFCS  preflight 
test  method  specification. 

Preflight  BIT  Effectiveness  Analysis  --  Test  quality  consists  of: 

• Fault  Detection  Capability 

• Fault  Localization  Capability 

• Performance  Verification  Capability 

A fault  is  defined  as  a deviation  from  specified  performance  that  demands  DAFCS  corrective  maintenance. 

Fault  Detection  Capability  --  This  is  defined  as  the  ability  to  detect  faults  by  means  of  available 
facilities. 

’^nd 

DAFCS  Fault  Detection  Capability  = 1 - ^ — (%) 


X , = failure  rate  for  DAFCS  faults  not  detected 
nd 

X,  , = failure  rate  for  total  DAFCS 
tot 


Fault  Localization  Capability  --  This  is  defined  as  the  probability  of  replacing  the  faulty  DAFCS 
LRU  for  detectable  DAFCS  faults. 


Fault  Localization  Capability 


1 - 


FFR 


where 

FFR  = total  DAFCS  frequency  (failure  rate)  of  faulty  replacement 

X , = failure  rate  for  detectable  DAFCS  faults 

d 

The  frequency  of  faulty  replacement  for  the  DAFCS  is  defined  as  the  failure  rate  for  detectable 
DAFCS  faults  that  results  in  the  replacement  of  a DAFCS  LRU  that  does  not  contain  the  fault.  The  total 
DAFCS  frequency  of  faulty  replacement  is  determined  from  the  summation  of  the  frequency  of  faulty 
replacement  for  each  DAFCS  l.RU. 

Performance  verification  capability  is  established  by  a test-by-test  comparison  of  estimated  opera- 
tional requirements  versus  test  and  measurement  tolerances,  confirming  that  test  and  measurement  toler- 
ances are  consistent  with  operational  requirements. 

Fault  detection  capability  and  fault  localization  capability  calculations  indicate  that  both  the  detection 
and  fault  requirements  are  met  tor  the  DAFCS.  Calculated  values  are: 

• DAFCS*  fault  detection  effectiveness;  96.7  percent 

• DAFCS  fault  isolation  effectiveness:  99.6  percent 

Preflight  BIT  Verification  --  Preflight  BIT  ability  to  detect  hardware  failure  was  verified  by  hard- 
ware testing.  The  testing,  performed  on  the  SAAB-SCANIA  rig  using  DAFCS  prototype  hardware, 
included: 

• Analysis  of  BIT  test-measured  values  to  compare  with  predicted  nominals. 

• Insertion  of  faults  by  opening  connector  pins,  lifting  component  leads,  etc.  to  exercise  each 
teat  included  in  the  preflight  BIT. 


Note:  DAFCS  prelight  fault  detection  effectiveness  should  not  be  confused  with  in-flight  potentially 
catastrophic  failure  detection  effectiveness,  which  is  well  above  99  percent. 
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CONCLUSIONS 

The  JA-37  DAFCS  --  a single  processor  Digital  Automatic  Flight  Control  System  with  potentially 
destructive  authority  --  has  been  designed,  developed,  and  verified  to  have  a catastrophic  failure  pro- 
bability less  than  1 x 10“®  for  a 90-minute  mission.  The  verification  process  was  a major  element  of 
the  DAFCS  development  program.  It  involved  rigorous  analysis,  testing,  and  configuration  manage- 
ment. This  effort  has  proved  cost  effective  in  that  it  justified  deletion  of  a redundant  second  channel  in 
the  production  DAFCS,  significantly  reducing  production  recurring  costs. 
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SUMMARY 

A triplex  digital  fly-by-wire  flight  control  system  was  developed  and  then  installed  in  a NASA  F-8C  aircraft 
to  provide  fail-operative,  full  authority  control.  Hardware  and  software  redundancy  management  techniques 
were  designed  to  detect  and  identify  failures  in  the  system.  Control  functions  typical  of  those  projected  for 
future  actively  controlled  vehicles  were  implemented.  This  paper  describes  the  principal  design  features  of  the 
system,  the  implementation  of  computer,  sensor,  and  actuator  redundancy  management,  and  the  ground  test 
results.  An  automated  test  program  to  verify  sensor  redundancy  management  software  is  also  described. 


SYMBOLS  AND  ABBREVIATIONS 


A.  B,  C 

channel  designations 

MVL 

midvalue  logic 

A . B , C 
s s s 

selected  signals 

N 

provisional  failure  count  tolerance 

C* 

aircraft  response  parameter , 

N 

z 

normal  acceleration,  positive  down,  g 

CA  , CB , CC 

Ap 

PRl 

2 

differential  pressure,  N/m 
primary  flight  control  system 

hardware  comparator  designations 

<? 

body  axis  pitch  rate,  rad/sec 

CAS 

command  augmentation  system 

2 

CBS 

computer  bypass  system 

q 

dynamic  pressure,  kN/m 

CIP 

computer  input  panel 

RAV 

remotely  augmented  vehicle 

DAC 

digital-to-analog  converter 

RM 

redundancy  management 

DFBW 

digital  fly-by-wire 

s 

Laplace  transform  variable 

g 

2 

acceleration  of  gravity,  m/sec 

SAS 

.stability  augmentation  system 

IFU 

interface  unit 

V 

CO 

cros.sover  velocity,  m/sec 

Kc* 

pitch  CAS  loop  gain,  deg/sec/g 

VTAS 

true  airspeed , knots 

><XF 

flap-to-stabilizer  crossfeed  gain , 

0 

angle  of  attack  , deg 

deg/deg 

“l. 

limit  angle  of  attack,  deg 

K i,  K^.  Kj, 
K,.  Kj 

LVDT 

control  system  gain  designations 
linear  variable  differential  transformer 

03 

n 

natural  frequency,  rad/sec 
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1.0  INTRODUCTION 

The  National  Aeronautics  and  Space  Administration  (NASA)  is  conducting  research  in  digital  fly-by-wire 
(DFBW)  flight  control  for  aircraft.  The  primary  impetus  for  this  work  has  come  from  the  projected  flight  control 
.system  requirements  for  advanced  military  and  commercial  aircraft,  particularly  in  the  area  of  active  controls. 
Such  systems  must  have  extremely  high  levels  of  reliability  and  computational  capacity. 

From  1972  to  1973,  the  NASA  Dryden  Flight  Research  Center  (DFRC)  conducted  flight  tests  of  a DFBW  con- 
trol system  in  an  F-8C  aircraft.  The  F-8  DFBW  aircraft  was  equipped  with  hardware  developed  for  the  Apollo 
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i lunar  module  guidance , navigation , and  control  system , and . from  its  first  flight , was  flown  with  the  basic 

mechanical  control  system  removed.  A triplex  analog  fly-by-wire  control  system  provided  emergency  backup 
control.  This  program  successfully  demonstrated  the  feasibility  of  DFBW  control  for  aircraft  (refs.  1 to  4) . 

[ 

[Since  that  program,  analog  fly  by  wire  for  production  aircraft  has  become  a reality  in  the  F-16  airplane 

(ref.  5)  . In  addition,  digital  flight  control  has  been  actively  investigated  by  government  and  industry  in  the 
i United  States  (refs.  6 to  17)  and  in  Europe  (ref.  18) . The  space  shuttle  orbiter  will  also  depend  on  a redundant 

[ digital  fly-by-wire  system  for  primary  flight  control  (ref.  19).  However,  a fault-tolerant,  flight -critical  DFBW 

I system  has  yet  to  be  flight  tested . NASA  is  currently  conducting  research  in  several  technoicgj'  areas  related 

I to  advanced  DFBW  control.  This  research  program  is  planned  to  culminate  in  the  flight  test  of  a fail-operative, 

[ fail-safe  triplex  DFBW  control  system  in  the  F-8C  aircraft.  The  primary  objective  of  the  program  is  to  provide 

' a design  base  for  future  practical  DFBW  control  systems . The  specific  tasks  related  to  the  flight  research 

f program  are  to  formulate  and  evaluate  in  flight  (a)  hardware  and  software  redundancy  management  concepts 

I for  sensor,  computer,  and  actuator  systems;  and  (b)  digital  flight  control  laws  of  the  type  projected  for 

f advanced  aerospace  vehicles.  In  addition  to  these  tasks,  selected  redundancy  management  concepts  for  the 

1 space  shuttle  orbiter  will  be  flight  tested  on  the  F-8C  aircraft.  To  do  this,  software  routines  developed  for  the 

orbiter's  primary  and  backup  flight  control  systems  will  be  executed  in  the  F-8  DFBW  system . 

I Flight  hardware  has  been  installed  in  the  F-8C  test  bed,  and  final  software  verification  and  systems  integra- 

tion testing  has  been  completed.  This  paper  describes  the  F-8  DFBW  system  design  and  mechanization,  and 
summarizes  the  ground  test  experience  with  the  system . 


2.0  SYSTEM  DESCRIPTION 

2.1  F-  8C  Testbed  Aircraft 

The  F-8C  aircraft  (fig.  1)  is  a single-engine,  single-place  U .S , Navy  fighter  capable  of  supersonic  flight. 
The  aircraft  has  a two-position  variable  incidence  wing  for  reducing  fuselage  attitude  during  the  landing 
approach . 

The  modifications  to  the  F-8C  aircraft,  which  were  entirely  internal,  consisted  of  the  addition  of  hardware 
and  software  subsystems . The  mechanical  linkages  connecting  pilot  controls  with  surface  actuation  systems 
were  removed  in  the  first  phase  of  the  program . 

2 .2  Overall  System  Mechanization 

The  overall  mechanization  of  the  F-8  DFBW  control  system  is  shown  in  figure  2.  A triplex  digital  computer 
set  containing  the  control  law  and  system  redundancy  management  software  communicates  with  a specially 
designed  interface  unit  (IFU)  . The  IFU  processes  input  data,  which  consist  of  pilot  commands  and  aircraft 
sensor  signals,  and  output  data,  which  consist  of  surface  commands,  cockpit  displays,  and  telemetry  data. 
Surface  commands  are  routed  through  a switching  mechanism  to  the  servodrive  electronics  and  then  to  the 
force-summed  secondary  actuators,  which  are  installed  in  series  with  the  existing  F-8C  power  actuators. 

There  are  five  actuator  sets:  one  for  each  aileron  and  horizontal  stabilizer  surface  and  one  for  the  rudder. 

The  triplex  analog  computer  bypass  system  (CBS)  provides  the  pilot  with  an  emergency  unaugmented 
command  path  to  the  control  surfaces  in  the  event  of  a total  primary  digital  system  failure.  This  path  was  pro- 
vided primarily  to  protect  against  a common-mode  software  failure  in  the  infant  stages  of  flight  test.  The 
switching  mechanism  allows  either  the  primary  system  or  the  bypass  system  to  drive  the  secondary  actuators 
based  on  pilot  selection  or  fault  status . 

Electrical  power  is  provided  to  three  independent  flight  control  buses  by  an  engine-driven  dc  generator. 
Each  bus  is  protected  by  a 40-ampere-hour  battery  , which  would  allow  approximately  90  minutes  of  operation 
in  the  event  of  a loss  of  generator  power.  Secondary  actuator  hydraulic  power  is  provided  by  the  aircraft's 
three  hydraulic  systems,  each  of  which  supplies  one  of  the  triple  chambers  of  each  actuator. 

2.3  Hardware  Subsystem  Layout 

The  major  hardware  elements  of  the  DFBW  system  and  their  locations  in  the  F-8C  aircraft  are  shown  in 
figure  3.  The  triplex  digital  computers  and  IFU  are  on  a removable  pallet  assembly.  An  encoder/decoder, 
which  is  used  to  transmit  cockpit  display  information  to  and  from  the  IFU , is  in  the  nose  of  the  aircraft.  In 
addition  to  the  conventional  center  stick  and  rudder  pedal  controls,  a two-axis,  limited-displacement  sidestick 
is  available  to  the  pilot. 

The  computer  bypass  system  and  servodrive  electronics  are  contained  in  three  interchangeable  boxes.  The 
sen.sor  pallet  contains  triply  redundant  rate  gyros  and  accelerometers.  Other  sensors,  including  attitude  gyros 
and  angle  of  attack  and  sideslip  vanes,  are  at  separate  positions  in  the  aircraft . 

The  secondary  actuators  are  as  near  as  possible  to  the  F-8C  power  actuators.  The  shaft  of  each  secondary 
actuator  is  linked  through  mechanical  gearing  to  the  metering  valve  of  its  associated  power  actuator  at  the  point 
where  the  pilot's  controls  would  normally  be  connected. 

Four  cockpit  panels  provide  the  pilot  with  primary  system  controls  and  displays  (fig.  4) . The  mode  and 
gain  panel  gives  the  pilot  pushbutton  control  over  the  primary  and  bypass  modes  on  an  individual  axis  basis. 
Three  five-position  gain  switches  can  be  tied,  through  software,  to  any  parameter  that  requires  pilot  evaluation. 
Selection  of  pilot  relief  modes  is  made  through  the  digital  autopilot  panel , which  utilizes  magnetically  held 
switches . Caution  and  warning  messages  are  displayed  on  the  annunciator  panel . Four  of  the  20  annunciation 
lights  have  built-in  switches  to  perform  software  reset  functions.  The  computer  input  panel  provides  the  pilot 
with  a way  to  initiate  preprogramed  software  sequences,  such  ns  the  preflight  test  program.  This  panel  is  al.so 
intended  for  in-flight  use  in  selecting  certain  control  system  options.  Two  thumbwheel  switches  are  used  to 
select  the  program.  A three-digit  display  indicates  the  selected  program.  When  the  enter  switch  is  depressed, 
the  displayed  program  is  executed. 
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2.4  Primary  Digital  Flight  Control  System  Mechanization 

A functional  block  diagram  of  the  primary  digital  flight  control  system  is  shown  in  figure  5.  The  three 
channels  (A,  B,  C)  are  identical.  A variety  of  sensors  and  discretes  is  used  by  the  primary  system.  Table  I 
lists  each  input  sensor,  its  redundancy  level,  and  its  signal  type.  Analog  sensors  are  converted  thiough  a 12-bit 
analog-to-digital  converter . The  only  digital  input  is  from  the  altimeter . Input  discretes  and  their  redundancy 
levels  are  listed  in  table  2 . 

Each  channel  conditions  and  converts  sensor  data  associated  with  that  channel . The  data  are  placed  in  the 
buffer  memory  of  that  channel  and  in  the  buffer  memories  of  the  other  two  channels  by  way  of  the  serial  data 
buses.  Thus,  each  computer  contains  an  identical  set  of  redundant  sensor  data.  Sensor  redundancy  management 
is  performed  in  software  and  the  selected  sensor  signals  are  used  in  the  control  laws  to  compute  surface  command 
signals.  The  serial  data  buses  are  also  used  by  the  computers  to  transmit  and  receive  status  information  for 
computer  redundancy  management . The  intercomputer  discretes  are  used  only  to  synchronize  the  computers . 

The  IFU  output  consists  of  surface  position  commands  to  the  horizontal  stabilizer,  ailerons,  flaps  (symmetri- 
cal ailerons) , and  rudder;  discretes  to  the  cockpit  panels;  and  telemetry  data.  The  panel  discretes  are  trans- 
mitted serially  to  the  encoder /decoder  for  distribution  within  the  cockpit.  The  serial  telemetry  data,  which 
consist  of  internal  digital  computer  data,  are  transmitted  to  an  onboard  tape  recorder.  Surface  commands  are 
generated  through  a 12-bit  digital-to-analog  converter  to  an  electronic  switch.  When  the  primary  system  is 
active  these  signals  are  passed  directly  through  the  switch  to  hardware  midvalue  logic  (MVL)  select  circuits. 
There  is  a set  of  three  MVL  modules  for  each  of  the  five  secondary  actuators . The  function  of  the  comparators  is 
described  in  the  section  on  actuator  redundancy  management . 

The  secondary  actuators  use  high-gain  two-stage  servovalves  to  control  h ydraulic  pressure  of 
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2.07  X 10  N/m  (3000  Ib/in  ) across  each  of  three  pistons.  Force  summing  occurs  along  the  common  output  shaft, 
which  is  mechanically  linked  to  the  metering  valve  of  the  existing  dual-tandem  F-8C  power  actuators.  The  nomi- 
nal characteristics  of  the  secondary  and  power  actuators  are  given  in  table  3. 

2 . 5 Digital  Control  Computer 

The  computers  used  in  the  digital  flight  control  system  are  general  purpose,  stored-program  machines  that 
use  a microprogramed  instruction  set.  In  addition  to  the  two  sets  of  eight  fixed-point  general  registers,  the 
computers  contain  eight  hardware  registers  for  floating-point  operations.  The  major  computer  characteristics 
are  listed  in  table  4.  The  F-8  DFBW  configuration,  which  will  be  flown  initially  with  24,576  words  of  memory, 
allows  expansion  to  32,768  36-bit  words,  including  store  protect  and  parity  bits.  The  computers  are  cooled  by 
individual  fans. 

2.6  Interface  Unit 

The  IFU  consists  of  all  of  the  equipment  necessary  to  process  and  condition  the  input  and  output  signals  of 
the  three  digital  flight  computers.  The  major  functional  design  requirements  were  to  provide  a simple  interface 
with  the  programer  for  most  input-output  functions , to  provide  interchannel  isolation  for  prevention  of  common- 
mode failures,  to  allow  expansion  to  a fourth  channel,  and,  finally,  to  provide  a high  degree  of  flexibility  in  the 
research  application.  To  meet  the  F-8C  installation  envelope  requirements,  the  three  IFU  channels  were  packaged 
within  a single  enclosure,  but  each  channel  had  complete  electrical  isolation.  The  IFU  is  composed  of  individual 
modules  with  unique  functions. 

2.6.1  Control 

Each  channel  of  the  IFU  contains  a microprogramed  controller  that  decodes  computer  commands,  differenti- 
ating the  various  types  of  input  and  output  requests;  controls  data  transfers  between  the  various  fields  of  IFU; 
and  performs  validity  checking  of  control  and  data  signals. 

The  control  program  is  contained  in  a programed  read-only  memory  in  each  IFU  channel . The  microprogram 
flow  is  shown  in  figure  6.  The  microprogram  cycles  in  a wait  state,  searching  for  a command  from  the  computer 
to  begin  a transaction.  Upon  receipt  of  the  appropriate  signal,  the  first  word  transmitted  is  decoded  to  distin- 
guish the  type  of  transaction,  and  appropriate  flags  are  set  in  the  interface  unit  to  alert  or  initiate  activity . A 
dialogue  of  signals  and  data  between  the  controller  and  the  computer  continues  until  the  transaction  is  complete. 

A timeout  function  in  tlie  control  section  guards  against  response  delays.  If  a microprogram  step  waits  for  a 
response  for  longer  than  32  microseconds,  the  transaction  is  terminated  and  an  input -output  error  alarm  is  issued 
to  the  computer . 

2.6.2  Computer  Interface 

Each  IFU  channel  communicates  with  its  associated  computer  by  way  of  a dedicated  interface,  which  consists 
of  parallel  input  and  output  buses  for  data  and  a number  of  dedicated  function  interfaces  that  control  the  input- 
output  section  of  the  computer  and  the  signals  to  the  interface  unit. 

All  computer  to-lFU  data  transfers  are  computer  initiated  and  are  accomplished  by  way  of  a computer  mode 
that  minimizes  the  requirement  for  the  computer  software  to  actively  participate  in  the  input-output  process. 
During  this  data  transfer  time,  other  operations  can  take  place  in  the  computer,  but  the  input  output  section  is 
not  available  for  those  operations. 

2.6.3  Encoder /Decoder 

Functionally  , the  encoder/decoder  can  he  considered  a part  of  the  IFU  although  it  is  physically  separate. 

One  function  of  the  encoder /decoder  is  to  format  switch  inputs  from  the  cockpit  panels  into  several  words  and  to 
transmit  these  to  the  IFU  by  way  of  a serial  digital  data  bus.  The  other  function  of  the  encoder/decoder  is  to 
receive  words  transmitted  from  the  IFU  , process  them,  and  drive  the  cockpit  displays.  As  with  the  IFU.  the 
encoder/decoder  is  arranged  in  a channel  configuration  to  provide  fault  tolerance. 
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2.6.4  Data  Input 

Each  channel  is  capable  of  processing  the  input  data  listed  in  tables  1 and  2 and  consists  of  32  analog  signals 
and  10  discrete  words  of  16  bits  each.  Five  of  these  discrete  words  are  serial  data.  All  of  the  analog  inputs  are 
prefiltered  in  the  IFU  . 

2.6.5  Data  Output 

Each  channel  can  transmit  10  12-bit  analog  words  and  five  discrete  words  of  16  bits  each . Three  of  the 
discrete  words  are  sent  in  serial  form.  In  the  flight  configuration,  only  four  of  the  analog  outputs  are  assigned; 
commands  to  the  horizontal  stabilizer,  the  ailerons,  the  flaps  (symmetrical  ailerons) , and  the  rudder. 

2.6.6  Buffer  Memory 

Each  channel  of  the  IFU  uses  three  buffer  memories  to  accumulate  data  from  the  resident  channel  and  from 
the  other  two  channels.  The  memories  are  used  to  store  converted  sensor  data  and  computer-to-computer- 
transferred  data  (crosslink)  . Each  of  the  memories,  which  are  of  the  first-in-first-out  type,  has  a capacity  of 
64  16-bit  words . A buffer  memory  is  loaded  by  control  pulses  from  the  sending  channel  and  unloaded  by  control 
pulses  from  the  receiving  channel . 

2.6.7  fnterchannel  Data  Transfer  Control 

So  that  each  channel  has  access  to  all  of  the  data , any  data  received  by  one  channel  are  simultaneously 
transferred  to  the  buffer  memories  of  the  other  two  channels,  as  shown  in  figure  7.  When  a channel  is  unpowered, 
it  cannot  participate  in  any  interchannel  transactions;  therefore,  each  operational  computer  commands  its  IFU  to 
ignore  any  unpowered  channel. 

2.6.8  Data  Recording 

Each  channel  has  a separate  buffer  memory  for  storing  user-selected  digital  data  for  onboard  recording . 

The  system  is  capable  of  recording  1000  32-bit  data  words  per  second  per  channel . 

2.6.9  Failure  Detection 

Each  channel  contains  failure  detection  circuitry  to  monitor  the  IFU  and  computer  fault  status . Figure  8 
shows  the  fault  detection  functions  that  result  in  a channel  fail  declaration . The  channel  fail  signal  is  set  by  any 
of  the  following  conditions; 

(a)  Failure  of  the  IFU  clock 

(b)  Out-of-tolerance  condition  on  the  +28,  ±15,  or  +5  volts  dc  power 

(c)  Absence  of  computer  input-output  activity  for  longer  than  57  milliseconds 

(d)  Computer-generated  fail  discrete  caused  by  the  built-in  test  equipment  of  the  computer  or  by  software 
fault  detection 

(e)  The  agreement  by  two  operational  channels  that  the  third  channel  has  hard  failed 

Each  channel  forms  the  hard-fail  declaration  signals  for  the  adjacent  channels  from  software- generated  discretes. 
These  discretes  are  interlocked  with  the  hardware  fail  signal  to  prevent  a failed  channel  from  declaring  a hard 
fail  in  another  channel . 

2.7  Software  Structure 

The  F-8  DFBW  software  is  built  around  a flexible  executive  program  that  can  schedule  jobs  on  the  basis  of  a 
clock  or  event  interrupt  and  priority.  The  flight  control  program  is  written  entirely  in  assembly  language. 
Fixed-point  arithmetic  is  used  in  the  sensor  redundancy  management  routines  to  minimize  execution  time; 
floating-point  arithmetic  is  used  exclusively  in  control  law  routines.  The  sequence  of  the  F-8  DFBW  software  is 
shown  in  figure  9 , along  with  measured  execution  times  for  one  minor  cycle . 

An  executive  interrupt  initiates  the  minor  cycle.  After  the  computer  synchronization  and  computer  redun- 
dancy management  jobs  are  completed , the  next  minor  cycle  interrupt  is  scheduled  . The  input  data  are  then 
read  and  used  in  the  redundancy  management  and  control  law  routines.  These  two  routines  are  each  subdivided. 
To  minimize  transport  delays,  only  (hose  computations  necessary  to  determine  surface  position  commands  are 
computed  prior  to  the  output  job.  The  data  recording  processor  collects  and  transmits  20  full  words  of  data 
during  each  minor  cycle.  The  computer  input  panel  (CIP)  processor  handles  pilot  requests  for  initiation  of  pre- 
programed functions , which  include  control  law  options , display  of  fault  status , and  initiation  of  the  preflight 
test  program.  The  computer  self-test  is  the  lowest  priority  job  and  includes  arithmetic  unit  checks,  instruction 
tests , and  a memory  sum  check . 

In  addition  to  the  software  for  the  active  flight  control  program,  other  software  modules  exist  for  ground  test 
use  only.  These  include  the  preflight  test  program,  special  real-time  display  programs  for  use  with  a ground- 
based  cathode  ray  tube,  and  a crosslink  loader  that  enables  the  ground  crew  to  load  two  computers  from  the 
third.  The  software  memory  allocation  is  given  in  table  5. 


3.0  SYSTEM  REDUNDANCY  MANAGEMENT 

The  responsibility  for  redundancy  management  of  the  primary  system  is  shared  by  the  hardware  and  soft- 
ware functions.  This  section  describes  the  redundancy  management  designs  for  the  computers,  sensors. 
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discretes,  and  actuators.  The  system  design  ground  rules  require  fail-operative,  fail-safe  capability  for  critical 
functions  and  fail-safe  capability  for  noncritical  functions . 

3 . 1 Computer  Redundancy  Management 

Several  features  were  implemented  within  the  primary  system  to  detect  hardware  or  software  failures  within 
a computer  or  its  interface  and  to  isolate  the  failed  system  so  that  the  remaining  processors  could  continue  normal, 
transient- free  operation.  In  addition,  each  processing  system  was  designed  to  be  tolerant  of  power  transients, 
which  could  cause  momentary  cessations  of  computation . The  main  design  features  involved  in  computer  redun- 
dancy management  are  synchronization  and  error  handling. 

3.1.1  Synchronization 

To  insure  that  each  processor  handle  input  and  crosslink  data  at  the  same  two  points  in  the  computation 
cycle,  the  computers  are  software  synchronized  to  begin  each  20-mlllisecond  cycle  within  10  to  50  microseconds 
of  each  other.  No  other  synchronization  points  exist  within  the  computation  cycle.  Output  commands  are  sent 
independently  by  each  channel  when  they  are  available.  Because  of  the  close  tolerance  on  the  synchronization 
point  at  the  beginning  of  each  cycle  and  the  particular  minor  cycle  structure  used,  output  synchronization  is 
unnecessary . The  close  synchronization  also  provides  an  excellent  measure  of  channel  health  and  hence 
becomes  an  important  element  in  computer  fault  detection . 

To  synchronize  the  three  computers,  discrete  signals  are  sent  from  each  computer,  through  the  IFU , to  each 
of  the  other  two  computers  (fig.  10)  . At  the  beginning  of  each  cycle,  an  internal  computer  clock  interrupt 
occurs  and  the  computer  issues  a discrete  signal  to  each  of  the  other  computers . The  computer  then  reads  the 
discretes  it  has  received  from  the  other  computers . if  discretes  are  present  from  both  computers , the  discrete  is 
reset , and  a second  read  is  performed  to  insure  that  the  other  computers  have  reset  their  discretes . The  com- 
puter clock  is  then  reset  to  interrupt  at  the  next  cycle  time.  If,  after  a short  wait  to  allow  for  skew  between 
processors,  one  computer  fails  to  synchronize  with  the  other  two,  the  two  remaining  computers  exit  the  sync 
program  and  continue  normal  processing . 

One  subroutine  of  the  synchronization  program  is  the  intercomputer  transfer , or  crosslink , of  a small , 
selected  set  of  data  to  verify  that  each  computer  agrees  with  the  state  of  each  of  the  others . When  each  computer 
reaches  the  run  state , six  16-bit  words  are  exchanged  with  the  other  computers . If  a computer  finds  a disagree- 
ment , it  requests  a restart . The  synchronization  and  crosslink  algorithms  are  designed  to  operate  when  any  two 
of  the  three  or  all  three  computers  are  powered . Dual  discretes  are  used  to  identify  faults  in  the  discrete  signals. 
If  the  two  discretes  do  not  agree , a synchronization  failure  is  declared . 

3.1.2  Error  Handling 

Five  types  of  computer/IFU  errors  are  recognized  by  the  system,  and  each  error  type  results  in  a different 
action . The  error  types  are  alarms , restart  requests , soft  failures , hard  failures , and  self  failures . 

3. 1.2.1  Alarms 

Alarms  are  generated  by  either  hardware-  or  software-detected  errors.  Some  of  the  typical  alarms  are  as 
follows:  illegal  insiruction,  store  protect  error  (that  is,  an  attempt  to  use  data  as  instruction  or  vice  versa) , 
illegal  address,  pant'  error,  computer  self-test  error,  crosslink  error,  and  IFU  error.  When  an  alarm  indicates 
a serious  and  possibly  permanent  fault,  a restart  is  requested  in  an  attempt  to  clear  the  fault.  Other  alarms  are 
merely  logged  for  postoperation  analysis  . 

3. 1.2. 2 Restart  requests 

A restart  is  an  online  reinitialization  of  the  software  to  bring  all  computers  into  agreement  as  to  the  state  of 
the  system.  Some  of  the  typical  causes  of  restarts  are  power  transients,  synchronization  failures,  certain  alarms, 
crosslink  failures,  and  disagreement  among  real-time  counters. 

Whenever  a restart  is  requested,  the  three  computers,  by  way  of  the  crosslink,  exchange  enough  data  to 
guarantee  that  they  are  in  agreement.  This  transmission  includes  the  choice  of  the  computer  considered  to  have 
valid  data  and  the  data  to  be  used  by  all  computers.  The  exchanged  data  are  58  16-bit  words,  and  include  such 
items  as  sensor  failure  history,  control  law  parameters,  and  pointers  to  align  major  cycle  sequences. 

To  prevent  continuous  restart  requests  caused  by  a failure  in  the  system,  each  computer  maintains  a count 
of  all  restart  requests.  If  the  number  of  restart  requests  made  by  any  computer  exceeds  a prescribed  tolerance, 
that  computer  is  declared  hard  failed  and  its  requests  are  henceforth  ignored . The  entire  restart  process  takes 
approximately  4 milliseconds  from  recognition  of  request  to  resynchronization. 

3. 1.2. 3 Soft  failures 

A soft  failure  is  a reversible  declaration  by  any  computer  that  one  (or  both)  of  the  other  computers  appears 
to  be  unpowered . A .soft-failure  declaration  results  when  neither  .synchronization  nor  crosslink  is  received  fr  m 
the  offending  channel  for  10  consecutive  passes.  After  declaring  a channel  soft  failed,  the  remaining  channels 
bypass  the  search  for  synchronization  and  data  from  the  failed  channel.  The  failure  declaration  is  cleared  when 
crosslink  data  from  the  soft-failed  channel  reappear. 

3. 1.2. 4 Hard  failures 

A hard  failure  is  an  irreversible  declaration  by  any  computer  that  one  (or  both)  of  the  other  computers  is 
powered  but  not  functioning.  Two  conditions  can  lead  to  a hard-failure  declaration:  the  absence  of  crosslink 
data  from  a computer  that  is  correctly  synchronized  for  10  consecutive  minor  cycles,  and  an  excessive  number  of 
restart  requests  in  a given  time  period.  If  two  computers  agree  that  the  third  is  hard  failed,  a channel  failure  is 
declared . The  actuator  redundancy  management  then  reconfigures  to  prohibit  the  failed  computer  from  providing 
control  outputs . 


3. 1.2. 5 Self  failures 


A self  failure  is  an  irreversible  declaration  by  a computer  that  its  own  channel  is  no  longer  functional.  A 
self  failure  may  be  declared  when  a channel  has  too  many  input  output  errors,  is  declared  failed  by  the  other  two 
computers  for  10  consecutive  minor  cycles,  finds  the  other  two  computers  hard  failed,  or  has  an  excessive  rate  of 
restart  requests.  After  a self-failure  declaration,  all  active  computing  in  that  computer  is  terminated  and  the 
computer  goes  to  an  idle  state . 

3.2  Sensor  Redundancy  Management 

Sensor  redundancy  management  (RM)  is  divided  into  two  phases  (RM-1  and  RM-2)  thrt  are  executed  at 
different  times  in  the  computational  cycle.  The  only  function  of  RM-1  is  to  obtain  the  best  estimate  of  the  actual 
parameter  value  based  on  the  available  multiple  sensor  inputs.  RM-1  is  executed  during  every  minor  cycle 
between  the  completion  of  the  data  input  read  job  and  the  control  law  computations.  RM-2,  which  is  designed 
for  fault  detection  and  identification,  controls  reconfiguration  of  the  RM-1  select  logic.  RM-2  is  executed  late  in 
the  minor  cycle  and  may  be  performed  at  a slower  rate  than  HM-1  in  cases  where  isolation  of  a faulty  sensor  is 
less  time  critical. 

A typical  triplex  RM  algorithm  is  shown  in  figure  11 . RM-1  begins  as  a midvalue  select  mode,  changes  to  an 
averaging  algorithm  after  the  first  hard  failure,  and  finally  degrades  to  a default  output  v.  lue  after  the  second 
failure.  A hard  sensor  fault  is  declared  by  RM-2  when  a sensor  differs  from  the  selected  value  by  an  amount 
greater  than  the  allowable  tolerance  for  a given  number  (N)  of  consecutive  passes.  Failure-status  logic  monitors 
the  results  of  the  tracking  test  and  through  hard-fail  flags  causes  the  mode  or  function  using  that  sensor  to  be 
inhibited.  For  example,  should  the  entire  roll  rate  gyro  set  be  lost,  roll  stability  augmentation  system  (SAS) 
would  be  inhibited.  In  some  cases  annunciation  is  given  to  the  pilot  when  an  entire  sensor  set  has  been  lost. 

The  first  failures  of  sensors  in  a triplex  set  are  not  annunciated  to  the  pilot. 

3.3  Discrete  Redundancy  Management 

A simplified  representation  of  the  discrete  RM  algorithm  is  shown  in  figure  12.  The  mask  bits  are  initially 
all  set.  During  RM-1,  the  three  input  discretes  are  ANDED  with  their  respective  mask  bits  and  the  event  output 
is  determined  by  a majority  vote.  During  RM-2,  the  event  output  is  compared  with  the  masked  discretes  and 
disagreements  are  declared  to  be  provisional  failures . When  a specified  number  of  consecutive  disagreements 
has  occurred,  the  failure  detector  clears  the  corresponding  mask  bit  and  declares  that  input  hard  failed. 

Because  the  majority  voter,  on  subsequent  RM-1  passes,  always  sees  a zero  for  that  input,  the  output  in  effect 
becomes  the  AND  of  the  remaining  two  inputs.  If  the  two  remaining  inputs  disagree  for  N passes,  the  output  is 
set  to  zero  and  the  failure  status  logic  sets  the  event  failure  flag. 

A novel  approach  has  been  taken  in  the  software  implementation  of  this  algorithm.  Rather  ihan  processing 
each  trio  of  discretes  separately , up  to  32  sets  may  be  processed  simultaneously  by  operating  on  packed  full 
words  with  logical  instructions.  The  A inputs  are  contained  in  one  word,  the  B inputs  in  a second,  and  the  C 
inputs  in  a third.  The  three  masks  are  also  32-bit  full  words,  one  assigned  to  each  input  word.  Similarly,  the 
failure  flags,  disagreement  flags,  and  output  are  all  32-bit  full  words.  In  each  case,  a given  bit  position  in  the 
full  word  is  dedicated  to  a specific  event.  Great  economy  in  execution  time  has  been  achieved  with  this  technique. 


3.4  Actuator  Redundancy  Management 

Computer  output  and  servoactuator  RM  is  handled  entirely  in  hardware.  As  shown  in  figure  5,  the  command 
path  to  the  actuator  from  either  the  primary  or  bypass  control  system  is  controlled  by  both  pilot  command  and 
failure  status  signals.  Table  6 illustrates  the  switch  logic  for  sample  events  that  could  occur  while  the  primary 
system  is  active . 

If  comparator  CA  indicates  a discrepancy  between  the  selected  midvalue  and  that  channel's  input  command, 
the  A switch  transfers  to  the  CBS  state.  In  this  mode,  the  A input  is  forced  to  track  the  MVL  output.  If  a second 
comparator,  CB.  indicates  a failure,  both  the  B and  C switches  transfer  to  the  CBS  state,  placing  the  bypass 
system  in  complete  control  of  that  secondary  actuator.  No  single  failure  can  cause  all  three  channels  to  transfer 
to  the  bypass  mode  . 

Fault  detection  logic  mechanized  in  the  primary  digital  subsystem  in  both  hardware  and  software  can  also 
cause  the  switch  to  change  state.  A single  digital  system  failure  in  channel  C.  for  example,  causes  switch  C to 
transfer  to  the  CBS  state.  A second  failure,  such  as  channel  B,  causes  both  A and  B switches  to  transfer  to  the 
CBS  state.  In  this  case,  all  three  axes  of  control  are  transferred  to  the  bypass  system. 

Note  from  figure  5 that  for  an  unfailed  condition  or  for  channel  element  failures  prior  to  the  MVL  module,  the 
three  actuator  commands  generated  by  the  primary  or  bypass  systems  arc  identical.  If  an  MVL  modulo,  a servo 
electronics  element,  or  a secondary  actuator  element  fails,  fault  detection  occurs  within  the  servoelectronics 
module . 

Figure  13  is  a functional  block  diagram  of  a single  channel  of  one  of  the  secondary  actuators  and  the  electron 
ics  of  that  channel.  The  servoamplifier  signal  is  the  sum  of  the  position  command,  the  shaft  position  feedback, 
and  a nonlinear  function  of  differential  pressure  (Ap) . The  latter  signal  (Ap)  is  designed  to  minimize  the  low- 
frequency  force  fight  that  occurs  in  the  high  pressure  gain  system . The  circuit  acts  to  drive  the  Ap  of  each 
valve  toward  the  midvalue  of  all  three.  The  bandwidth  and  authority  of  the  Ap  feedback  are  limited  to  allow 
detection  of  faults  that  could  be  hazardous.  If  the  Ap  of  one  valve  differs  from  the  midvalue-selected  Ap  by 

approximately  8.3  X 10®  N/m^  (1200  Ib/in^)  for  400  milliseconds,  a fault  is  declared  which  disables  the  engage 
solenoid , dumps  supply  pressure  to  return . and  opens  a bypass  path  around  the  piston  in  the  failed  channel . 
Faults  are  annunciated  in  the  cockpit  and  reset  capability  is  provided  to  the  pilot.  A second  failure  in  the  same 
actuator  results  in  that  actuator's  being  turned  off.  Mechanical  centering  springs  move  the  disabled  actuator  to 
a safe  static  position . 
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4.0  CONTROL  LAWS 

The  control  laws  mechanized  for  the  F-8  DFBW  aircraft  were  selected  to  include  several  functions  projected 
for  use  in  future  active  control  applications  that  would  require  full-time,  full  authority  control  (ref.  20)  . The 
pitch,  roll,  and  yaw  axes  have  multiple  pilot-selectable  modes.  Each  axis  contains  a DIRECT  mode  which  pro- 
vides unaugmented  control  of  the  aircraft . In  this  mode , the  surface  command  is  the  sum  of  stick  and  trim  com- 
ponents . A pitch  SAS  mode  uses  scheduled  pitch  rate  feedback  to  improve  short  period  damping.  These  modes 
provide  less  complex  configurations  in  the  event  of  failures  in  certain  sensors  in  the  more  highly  augmented 
pitch  command  augmentation  system  (CAS)  and  lateral-directional  SAS  modes  which  contain  the  active  control 
functions . Only  the  pitch  CAS  and  lateral-directional  SAS  modes  are  described  in  detail  here . 

All  inner  loop  control  law  functions  are  computed  at  the  minor  cycle  rate  of  50  samples  per  second.  Autopilot 
functions,  scheduled  gain  updating,  and  other  less  time-critical  operations  are  computed  at  the  major  cycle  rate 
of  12.5  samples  per  second.  Both  the  minor  and  major  cycle  rates  can  be  altered  by  single-entry  changes  in  the 
executive  timing  routine.  Fiiter  coefficients,  however,  are  not  automatically  changed  with  changes  in  sample 
rate. 

The  control  laws  were  designed  in  the  continuous  domain.  The  Tustin  transform  was  used  to  obtain  discrete 
versions  of  the  continuous  filters . At  a sample  rate  of  50  samples  per  second , the  Tustin  transform  and  the 
matched  biiinear  transform  described  in  reference  21  yield  nearly  identical  results.  The  control  law  modules  are 
functionally  independent  of  the  redundancy  management  software  and  use  only  midvalue- selected  sensors  and 
majority -voted  discretes . 

4 . 1 Pitch  Command  Augmentation  System  Mode 

The  pitch  CAS  mode  includes  several  typical  active  control  functions.  If  a new  airplane  were  being  designed, 
such  control  laws  would  be  considered  in  the  initial  trade  studies  leading  to  the  ultimate  configuration . No 
structural  or  aerodynamic  changes  were  made  to  the  F-8C  aircraft  in  conjunction  with  these  control  laws; 
therefore,  it  is  not  possible  to  extract  performance  trade-off  information  directly  from  the  F-8  DFBW  flight  test 
re-suIts.  The  control  law  research  objective  of  the  F-8  DFBW  program  is  to  evaluate  the  mutual  interactions  of  the 
control  functions  in  a full  authority , flight-critical  digital  implementation . 

Figure  14  illustrates  the  command  augmentation  functions  implemented  in  the  CAS  mode.  The  basic  controller 
combines  prefiltered  stick  deflection  with  pitch  rate  and  normal  acceleration . The  resulting  signal  is  routed  to 
the  actuation  system  by  way  of  a variable  gain  (K^,) , which  is  scheduled  with  the  dynamic  pressure  derived 

from  altitude  and  Mach  number . To  minimize  excessive  stick  forces  during  large  changes  in  airspeed , neutral 
speed  stability  is  provided  by  an  effective  forward-loop  integration.  The  integration  is  mechanized  by  the 
cancellation  of  the  position  feedback  signal  of  the  secondary  actuators  at  low  frequencies. 

A significant  feature  of  this  control  law  is  that  it  was  designed  through  the  application  of  linear  optimal  con- 
trol theory  at  selected  flight  conditions.  Specifically,  the  motion  variable  C* , which  is  defined  as  ^ <)■ 

was  compared  with  the  output  of  a linear  second  order  command  model  with  a natural  frequency  of  7.4  radians 
per  second  and  a damping  ratio  of  0.91 . Minimization  of  a quadratic  cost  functional,  which  consists  of  the 
weighted  sum  of  the  C*  error,  its  integral,  the  elevator  rate,  and  the  elevator  command,  resulted  in  a control  law 
that  is  a linear  combination  of  the  assumed  state  variables.  The  latter  control  law  was  examined  for  low-gain 
loop  closures  and  possible  pole-zero  cancellations . The  result  of  these  simplifications  is  the  pitch  CAS  control 
system  shown  in  figure  14 . 

The  design  specifications  for  the  longitudinal  control  system  required  that  angle  of  attack  be  limited . The 
implementation  of  the  angle-of-attack  li.niter  and  its  integration  with  the  basic  C*  control  law  are  shown  in 
figure  15.  The  design  of  this  control  law  was  also  accomplished  by  the  use  of  optimal  control  theory.  The  quad- 
ratic cost  functional  included  pitch  rate,  angle  of  attack  and  its  integral,  elevator  rate,  and  elevator  command. 

As  in  the  case  of  the  basic  C*  design,  the  control  law  obtained  from  the  optimal  control  theory  at  selected  flight 
conditions  was  rearranged  to  yield  the  final  configuration . The  resulting  control  law  uses  angle  of  attack  and 
high-passed  pitch  rate.  The  angle  of  attack  is  referenced  to  the  value  of  a^,  the  limit  angle  of  attack. 

As  shown  in  figure  14,  the  switching  logic  selects  the  more  positive  (nosedown)  command  of  either  the 
normal  CAS  command  or  the  a-limiter  command.  The  high-passed  pitch  rate  provides  an  anticipatory  term  when 
the  limit  angle  of  attack  is  approached  rapidly  . 

Direct  lift  produced  by  symmetrical  aileron  deflections  is  utilized  for  drag  reduction  in  maneuvering  flight 
and  for  ride  smoothing  in  turbulence.  The  complementary  structure  of  the  direct  lift  mode  is  illustrated  in 
figure  16.  Low-frequency  symmetrical  aileron  deflections  are  commanded  by  pitch  rate,  which  is  routed  through 
a first-order  lag  with  a time  constant  of  approximately  2 seconds.  This  mode  provides  a trailing  edge  flap 
deflection  schedule  that  was  chosen  for  a maximum  lift-to-drag  ratio.  Ride  smoothing  is  accomplished  by  relaying 
rigid-body  normal  acceleration  to  the  symmetrical  ailerons  through  the  gain  Kj  and  a high-pass  filter  with  a time 

con.stant  of  0.4  second.  Kj  is  .scheduled  with  dynamic  pressure.  The  high  pass  filter  attenuates  the  command 

during  steady  maneuvers.  Changes  in  pitching  moment  due  to  symmetrical  aileron  deflection  are  canceled  by  use 
of  a crossfeed  signal  through  which  is  chosen  to  be  the  ratio  of  the  pitching  moments  produced  by  unit 

deflections  of  the  horizontal  tail  to  those  of  the  symmetrical  ailerons.  Reductions  of  10  to  30  percent  in  rigid-body 
normal  accelerations  due  to  gust  disturbances  are  predicted  (ref.  20). 

4.2  Lateral-Directional  Stability  Augmentation  System  Modes 

Figure  17  illustrates  the  mechanization  of  the  augmented  modes  for  the  lateral  directional  SAS  modes. 
Although  the  roll  and  yaw  SAS  modes  are  individually  selectable  by  the  pilot,  they  are  treated  collectively  in 
this  discussion.  The  criteria  for  the  design  of  this  system  included  improved  damping  of  the  Dutch  roll  oscilla- 
tory mode  and  good  directional  stability  and  turn  coordination  at  all  usable  angles  of  attack . Application  of  the 
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linear  quadratic  optimal  control  algorithm  at  selected  flight  conditions  yielded  a feedback  gain  matrix  with  non- 
zero gain  on  every  state  variable  to  every  control  input.  A separate  algorithm,  described  in  reference  20,  was 
used  to  drive  to  zero  the  gains  for  which  implementation  was  impractical.  In  the  resulting  control  mode,  high- 
passed  yaw  rate  provides  increased  Dutch  roll  damping  with  no  steady-stage  effect.  Turn  coordination  is  pro- 
vided by  compensated  lateral  acceleration  and  an  aileron-to-rudder  interconnect.  Minimization  of  steady-slate 
sideslip  is  achieved  by  feeding  the  integral  of  lateral  acceleration  to  the  rudder.  The  gains  are  scheduled  with 
angle  of  attack  to  yield  good  performance  in  all  maneuvering  conditions. 

4.3  Autopilot  Functions 

In  addition  to  the  inner  loop  functions,  the  control  law  set  includes  several  autopilot  functions— attitude  hold, 
Mach  hold,  altitude  hold,  and  turn  command.  Control  stick  steering  is  available  to  allow  pilot  control  through 
the  autopilot . 

4.4  Remotely  Augmented  Vehicle  Mode 

In  addition  to  the  control  modes  described  above,  a special  remotely  augmented  vehicle  (RAV)  mode  has  been 
mechanized  to  provide  increased  research  capability  (fig.  18) . Pilot  stick  commands  and  vehicle  motion  sensors 
are  telemetered  to  a ground  statioT^  as  10-bit  words.  Highly  speculative  and  advanced  control  concepts  are  to  be 
programed  in  FORTRAN  on  the  ground-based  minicomputer.  The  control  surface  commands  are  then  transmitted 
to  the  F-8C  aircraft,  where  an  uplink  receiver  and  decoder  preprocesses  the  commands  and  sends  them  to  the 
primary  digital  system.  Pilot-selectable  RAV  modes  in  the  digital  computer  software  test  the  validity  of  the 
surface  commands  and  route  them  to  the  actuator  drive  electronics . 

The  RAV  mode  is  an  asynchronous  job  which  is  executed  at  the  rate  of  100  samples  per  second.  Control  of 
the  aircraft  through  the  RAV  mode  is  selectable  by  axis.  In  the  event  of  an  uplink  fault  or  an  invalid  command, 
the  control  system  automatically  reverts  to  the  SAS  modes . 

The  RAV  mode  of  operation  will  be  used  to  evaluate  concepts  such  as  advanced  adaptive  control , where  the 
researcher  will  be  able  to  modify  the  control  laws  easily  without  the  possibility  of  adversely  affecting  the  basic 
primary  control  law  software. 

5.0  GROUND  TEST  EXPERIENCE 

The  majority  of  the  system  integration  and  software  verification  testing  was  accomplished  in  the  NASA  DFRC 
iron  bird  facility  (fig.  19).  A modified  F-8  airframe  provides  a complete  test  bed  for  the  DFBW  system.  The 
systems  in  the  iron  bird  include  a primary  digital  system  pallet,  an  encoder /decoder,  a computer  bypass  system, 
actuator  drive  electronics,  a complete  set  of  secondary  and  power  actuators,  three  independent  hydraulic 
systems,  three  independent  electrical  systems,  and  complete  cockpit  panels.  The  iron  bird  is  interfaced  with  an 
all-digital  nonlinear  representation  of  the  F-8C  aerodynamics,  which  allows  complete  pilot-in- the-loop  simulation. 

To  facilitate  ground  testing,  several  real-time  di.splays  were  developed  for  use  with  a ground-based  cathode 
ray  tube  and  printer  to  display  either  preprogramed  data  lists  or  user-specified  data  lists.  Test  data  arc  pri- 
marily collected  by  the  data  recording  channel.  The  nine-track  tapes  generated  at  the  iron  bird  are  processed 
directly  on  the  NASA  DFRC  central  computer. 

More  than  1500  hours  of  ground  tests  have  been  accumulated  on  the  primary  digital  flight  control  system, 
using  both  a breadboard  triplex  system  and  the  actual  flight  hardware.  This  testing  has  included  both  software 
verification  and  system  performance  evaluation . 

5 . 1 Sensor  RM  Verification  Experience 

Considerable  experience  has  been  gained  from  the  software  verification  testing  of  the  F-8  DFBW  sensor 
RM  algorithms.  A philosophy  of  exhaustive  testing  was  adopted  at  the  outset  because  of  the  varied  typos  of 
sensors  involved . Complete  testing  of  the  RM  algorithm  for  all  sensors  proved  to  be  a sizeable  effort  and  spurred 
the  development  of  a software  tool  to  speed  the  verification  process  and  reduce  the  engineering  test  time  required. 

5.1.1  Test  Requirements 

For  each  analog  sensor,  several  characteristics  of  the  redundancy  management  algorithm  had  to  be  tested. 
First,  three  measurements  had  to  be  made:  the  tolerance  level  at  which  provisional  failure  counters  start 
counting,  the  value  N at  which  a hard  failure  is  declared,  and  the  default  value  of  the  output  following  the  second 
failure . Second  , proper  operation  of  both  the  midvalue-select  and  averaging  modes  had  to  be  verified  . Third  . 
proper  processing  of  the  RM-2  counters  and  flags  had  to  be  checked.  Fourth,  control  law  inhibits  and  annuncia 
tion  to  the  pilot  had  to  be  demonstrated  to  complete  the  test . 

5.1.2  Test  Mpthnd 

To  obtain  the  above  test  results,  a means  was  required  to  excite  the  RM  algorithms  with  simulated  sensor 
inputs.  This  excitation  was  provided  by  the  RM  support  software  package,  which  was  developed  at  DFRC 
specifically  for  the  F-8  DFBW  project.  The  software  package  is  written  in  FORTR.AN  and  is  executed  with  the 
real  time  simulation  program  used  to  support  the  iron  bird  testing.  Using  the  software  package,  any  of  the  wave 
forms  shown  in  figure  20  can  be  injected  independently  into  the  inputs  of  the  RM  algorithm.  The  package  is 
controlled  from  a remote  terminal  as  shown  in  figure  21  , which  permits  real-time  selection  of  the  wave  form,  con 
trolling  parameters  , and  engage  and  bypass  status . 


Any  sen.sor  can  be  selected  by  way  of  the  terminal  to  be  modeled  as  a redundant  sensor.  The  particular 
wave  form  and  the  characteristic  parameters,  such  as  drift  rate  or  bias,  are  selected  for  each  channel . The  con 
taminated  channels  can  then  be  individually  engaged  or  bypassed  through  the  use  of  the  terminal. 
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To  efficiently  test  all  elements  of  the  RM  algorithm,  a composite  test  profile  was  programed  into  the  RM 
support  software  program  to  exercise  all  of  the  RM  functions.  Within  a span  of  33  seconds,  six  tests  are  per- 
formed. These  tests  are  controlled  by  user-inserted  tolerance  levels  and  an  N value.  After  the  RM  support  soft- 
ware program  is  engaged,  the  test  begins  with  impulse  bursts  above  the  tolerance  level,  but  within  a pulse  width 
of  N'l  minor  cycles  so  as  not  to  trigger  hard  failures.  The  second  segment  introduces  transients  below  the  toler- 
ance level  to  verify  that  no  provisional  failures  are  declared . The  third  segment  forces  the  midvalue  select  logic 
to  periodically  switch  among  the  three  inputs . The  fourth  and  fifth  segments  force  a hard  failure  of  sensor  C 
followed  by  the  hard  failure  of  a second  sensor,  which  is  expected  to  fail  the  entire  sensor  set.  The  sixth  test 
verifies  the  correct  default  value  for  arbitrary  sensor  inputs . 

This  sequence  has  also  been  used  to  test  dual  sensors . Two  additional  automated  profiles  have  been  incor- 
porated into  the  RM  support  software  program . One  of  the  automated  profiles  produces  three-channel  synchron- 
ous sawtooth  wave  forms  to  test  those  sensors  which  have  rate  reasonability  tests  at  the  output  of  RM-1.  The 
second  automated  profile  was  designed  to  test  dual  and  triplex  discretes  with  fault  impulses  that  increase  in  width 
by  one  minor  cycle  each  iteration . 

5.1.3  Test  Results 

Figure  22  is  an  example  of  the  digital  plots  obtained  using'the  RM  support  software  program  to  simulate  a 
typical  triplex  analog  sensor.  The  sensor  inputs  and  selected  output  have  been  normalized  to  the  tolerance  level 
and,  for  the  case  shown,  the  provisional  failure  count  N is  five.  Test  segment  1 exercises  the  algorithm  with 
repetitive  impulse  bursts  50  percent  above  the  tolerance  level  to  verify  that  the  fault  detection  logic  is  operating 
properly.  The  output  remains  unaffected  while  the  provisional  fail  counters  tally  down  to  one,  stopping  short  of 
the  failure  declaration  level  of  zero,  and  then  reset  to  five  where  the  input  returns  to  normal. 

The  second  test  segment  verifies  that  no  provisional  failures  are  tallied  for  sensor  excursions  within  the 
tolerance  limits . The  output  reflects  the  passage  of  the  1-hertz  sinusoids  when  they  appear  on  two  or  more  of 
the  inputs . 

The  third  segment  tests  proper  operation  of  the  midvalue  select  module.  The  output  wave  form  is  clipped  as 
sensors  A and  C alternately  become  the  midvalue.  Once  again,  no  activity  is  seen  in  the  failure  counters. 

The  fourth  test  segment  forces  a failure  in  sensor  C by  applying  ramp  inputs  to  channels  A and  B to  a magni- 
tude 1.5  times  the  tolerance  level,  thus  simulating  a null  failure  in  the  C sensor.  When  the  ramp  inputs  reach 
the  tolerance  level,  the  channel  C counter  begins  to  tally  down.  When  the  counter  reaches  zero,  the  C sensor  is 
declared  failed,  the  algorithm  switches  from  the  midvalue  to  the  average  of  the  A and  B sensors,  and  the  C 
counter  is  latched  at  - 1 to  signify  that  reconfiguration  has  been  completed . 

The  fifth  test  segment  introduces  a divergence  between  the  A and  B sensor  inputs  to  cause  a second  failure. 
When  the  difference  between  the  output  and  either  of  the  inputs  exceeds  the  tolerance  level . the  failure  counters 
begin  to  tally  down.  Because  sensor  input  A is  tested  first  by  the  algorithm,  its  counter  reaches  zero  first. 

When  this  occurs,  the  output  is  clamped  to  the  default  value  (in  this  case,  zero)  , the  A counter  is  set  to  -1 , and 
all  RM  activity  ceases  for  this  sensor.  Test  segment  6 verifies  that  activity  has  ceased  for  sensor  A and  applies 
a large  signal  on  all  three  sensor  inputs  to  assure  that  subsequent  sensor  A activity  is  not  passed  by  the  algorithm . 

The  RM  support  software  program  is  also  used  to  conduct  failure  mode  and  effects  demonstrations  during 
piloted  simulation  and  to  evaluate  overall  performance  of  the  sensor  RM  algorithms  in  various  simulated  environ- 
ments . 

5.2  Computer  RM  Test  Experience 

The  synchronization,  restart,  and  fault  detection  performance  of  the  primary  system  has  been  extensively 
evaluated  during  the  ground  test  phase  of  the  program . The  performance  of  the  system  has  been  substantially 
improved  by  the  changes  made  during  the  ground  tests . 

5.2.1  Synchronization  Performance 

The  specification  for  maximum  synchronization  skew  was  200  microseconds . In  practice,  much  better  syn- 
chronization has  been  achieved.  Figure  23(a)  shows  the  normal  .sync  discrete  pattern  observed  on  an  oscillo- 
scope. The  leading  edge  of  the  pulse  indicates  the  setting  of  the  intercomputer  discretes  by  each  channel.  The 
trailing  edge  of  the  pulse  represents  the  time  at  which  each  computer  has  verified  the  correct  sync  state  of  the 
other  two  computers.  The  computers  search  for  the  run  state  of  the  other  computers  before  leaving  the  sync  job. 
Total  execution  time  of  the  sync  job  is  approximately  230  microseconds.  The  average  skew  time  between  computer 
discretes  at  the  end  of  the  sync  state  is  10  microseconds.  Occasionally,  because  of  a variation  in  the  computation 
time,  a computer  arrives  too  late  to  be  detected  by  the  other  two  computers  on  the  first  check.  In  this  case,  as 
shown  in  figure  23(b) , an  additional  pass  through  the  sync  loop  is  required  before  a good  sync  state  is  achieved. 
The  three  computers  begin  the  run  state  within  the  normal  sync  tolerance  even  though  computer  B is  late. 

In  the  original  design,  rescheduling  of  the  minor  cycle  interrupt  followed  the  crosslink  job.  Because  of 
normal  variations  in  the  input-output  execution  times,  synchronization  discrete  skew  of  approximately  40  micro- 
seconds was  observed.  When  the  interrupt  scheduler  was  placed  before  the  crosslink,  a more  stable  sync  pro- 
cess resulted.  It  was  also  found  that  because  they  were  no  longer  looking  for  the  third  computer,  the  computers 
in  the  dual  computer  configuration  were  completing  the  synchronization  job  too  quickly . The  sync  timing  had  to 
be  adjusted  to  prevent  interference  with  the  input -output  process.  In  addition,  it  was  discovered  that  execution 
of  a ground  tost  display  program  in  one  computer  resulted  in  a condition  sufficiently  out  of  sync  to  cause  that 
channel  to  be  declared  failed.  Changes  in  the  display  program  and  the  sync  algorithm  software  were  required. 

The  present  algorithm  is  reasonably  tolerant  of  transient  disturbances , but  still  able  to  identify  a permanent 
sync  failure  in  less  than  200  milliseconds.  The  achievement  of  synchronization  and  successful  crosslink  was 
found  to  be  a strong  positive  indication  of  system  health . 
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Ground  tests  have  demonstrated  the  system's  capability  to  resynchronize  after  numerous  combinations  of 
power  shutdown  and  recovery.  Following  power  application,  the  triplex  system  returns  to  the  run  state  without 
ground  crew  or  pilot  interaction.  Figure  24  illustrates  a power  shutdown  and  recovery  sequence.  Each  of  the 
three  outputs  of  the  pitch  digital-to-analog  converters  (DAC's)  is  approximately  0.6  volts  before  the  power  is 
turned  off.  When  channel  A is  repowered,  it  loops  in  the  sync  routine  waiting  for  a second  machine.  The  DAC 
voltage  peak  is  due  to  the  characteristics  of  the  DAC  hardware  following  initial  power  application.  Within 
200  milliseconds  of  the  channel  B power  application,  synchronization  is  achieved  between  channels  A and  B and 
the  previous  DAC  output  is  restored.  Likewise,  channel  C synchronizes  with  channels  A and  B after  its  power 
is  restored . 

5.2.2  Restart  Performance 

Critical  to  the  normal  turn-on  sequence  and  to  abnormal  power  activity  is  the  operation  of  the  restart  routines 
that  restore  a channel  to  normal  operation.  The  primary  system  was  designed  to  be  tolerant  of  transient  inter- 
ruptions in  processing  due  to  supply  voltage  faults  or  intermittent  hardware  faults.  The  primary  design  for 
restarts  specified  that  DAC  outputs  be  restored  following  the  restart.  Figure  25  shows  the  typical  restart  per- 
formance of  the  system.  The  three  pitch  DAC  signals  are  initially  approximately  1.2  volts.  A series  of  power 
interruptions  in  channel  A or  channel  B has  no  effect  on  the  remaining  two  channels . In  addition , each  time 
power  is  restored  to  a channel,  the  channel  performs  a restart  and  is  initialized  to  the  current  output  state  by 
the  remaining  two  channels . The  rate  of  restarts  in  the  example  is  not  high  enough  to  hard  fail  any  channel . 

In  the  development  of  the  restart  process,  some  problems  were  encountered  in  assuring  that  a restarted  com- 
puter had  valid  data  before  it  resumed  normal  operation  and  that  the  other  two  channels  remained  unaffected  by 
the  process.  Undesirable  transients  were  observed  on  more  than  one  DAC  following  a single  channel  restart. 
Modifications  were  made  to  the  logic  that  selected  the  channel  to  be  used  as  a source  of  data  for  the  other  chan- 
nels , and  to  the  sensor  RM  logic , which  verified  the  presence  of  good  data  as  indicated  by  a leading  valid  bit  in 
each  offset  binary  data  word . 

All  features  of  the  error  handling  routines,  including  hard-,  soft-,  and  self-failure  actions,  were  tested  and 
found  to  perform  as  required,  although  the  number  of  allowable  provisional  failures  had  to  be  increased  from 
initial  design  values . 

5 . 3 Control  System  Test  Experience 

5.3.1  Control  Law  Performance 

Operation  of  the  control  system  modes  on  the  iron  bird  simulator  has  been  very  similar  to  that  predicted 
during  the  analytical  design  phase.  All  modes  have  been  evaluated  over  the  flight  envelope  in  closed- loop  simu- 
lation. Pilot  comments  have  been  favorable,  but  only  actual  flight  tests  will  reveal  the  extent  of  the  benefits  pro 
vided  by  many  of  the  control  modes , such  as  the  ride  smoothing  and  maneuver  flap  systems . 

A substantial  improvement  in  the  airplane  response  is  expected  in  the  augmented  modes.  Figure  26  shows 
the  response  of  the  simulated  F-8C  aircraft  to  a pilot  step  input  in  the  DIRECT , SAS  , and  CAS  modes.  The  short 
period  response  is  markedly  improved  in  the  augmented  modes.  Figure  27  shows  the  Dutch  roll  ro.sponse  of  the 
airplane  in  the  DIRECT  and  SAS  modes  in  the  roll  and  yaw  axes . Dutch  roll  damping  is  improved  in  the  SAS 
modes.  Other  results  have  shown  that  sideslip  due  to  aileron  inputs  is  reduced  in  the  SAS  modes. 

The  operation  of  the  angle-of-attack  limiter  during  piloted  simulation  is  shown  in  figure  28.  The  limit  for 
angle  of  attack  was  set  at  18°.  During  the  first  part  of  the  maneuver  (a  wind-up  turn) , the  limit  angle  of  attack 
is  reached  and  held  within  1°  while  the  pilot  smoothly  moves  the  stick  full  back . In  the  second  part  of  the  maneu 
ver,  a rapid  full  back  stick  maneuver  is  demonstrated.  As  in  the  first  part  of  the  maneuver,  angle  of  attack  is 
held  within  1°  of  the  limit  until  back  stick  pressure  is  released  sufficiently  to  return  control  to  the  pilot . 

No  significant  problems  were  encountered  in  the  development  and  refinement  of  the  control  laws  on  the  iron 
bird.  Some  adjustments  to  the  gain  schedules  were  necessary  to  improve  performance  at  .some  high  angle-of- 
attack  and  high  dynamic  pressure  conditions. 

5.3.2  Actuator  Performance 

The  design  bandwidth  and  hysteresis  for  three-channel  operation  have  been  met.  However,  some  develop 
mental  problems  were  encountered  in  the  pressure  equalization  circuit.  Two  of  the  hydraulic  systems  used  to 
power  the  secondary  actuators  were  also  used  to  supply  the  aircraft's  power  actuators.  Because  the  third 
hydraulic  system  served  only  utility  functions,  during  surface  movement  it  did  not  display  ns  large  a pressure 
drop  as  the  other  two  systems.  One  result  of  this  was  a small  actuator  oscillation  during  quiescent  operation. 

This  problem  was  alleviated  by  adjustments  in  the  pressure  equalization  bandwidth  and  deadband . 

Fluctuations  in  pressure  in  the  channel  supplied  by  the  utility  hydraulic  system  were  also  smaller  than 
those  of  the  other  two  channels  when  an  actuator  was  operated  into  its  stops,  and.  as  a result,  were  likely  to 
cause  nuisance  disconnects  of  that  channel . To  prevent  operation  into  and  out  of  the  secondary  actuator  stops, 
the  position  command  to  the  actuator  was  limited . 

One  additional  problem  that  required  system  modification  was  the  skew  in  the  turnoff  command  for  the  two 
remaining  actuation  channels  after  two  hardover  failures  in  one  actuator.  The  differential  pressure  existing 
during  the  50  to  100-millisecond  skew  was  sufficient  to  drive  the  control  surface  several  degrees  before  the 
actuator  was  completely  disabled. 


6.0  CONCLUDING  RKMARK.S 


A flight  critical,  fault  tolerant  digital  fly  by  wire  primary  control  .system  has  been  designed,  developed, 
and  ground  tested  in  an  iron  bird  facility  and  is  near  flight  mat\irity  . Ground  tests  have  demonstrated  the 
system's  ability  to  continue  operation  during  normal  system  variations  and  to  resume  operation  after  transient 


faults.  To  sufficiently  test  the  sensor  and  discrete  redundancy  management  algorithms,  a special  real-time 
automated  test  sequence  was  developed,  which  substantially  reduced  the  engineering  test  time  required. 
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The  majority  of  the  developmental  work  on  the  system  was  required  in  the  area  of  computer  redundancy 
management  where  the  three  channels  were  required  to  function  as  three  independent  devices  instead  of  as  one. 
Good  synchronization  performance  has  been  achieved  and  allows  a sensitive  determination  of  system  health. 
Moderate  changes  were  made  in  the  computer  redundancy  management  algorithms  during  the  ground  test  phase 
to  eliminate  common- mode  faults. 

The  sensor  redundancy  management  software  requires  a large  amount  of  execution  time  relative  to  the  con- 
trol law  computation , but  less  memory  . 

Ground  simulation  has  shown  that  the  control  laws  provide  the  predicted  improvement  in  the  aircraft's 
response  and  handling  qualities.  The  results  of  the  flight  test  program,  however,  will  ultimately  determine  the 
overall  acceptability  of  the  system . 
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TABLE  1. -INPUT  SENSORS 


Sensor 

Redundancy 

level 

Signal  type 

Pitch  rate 

3 

dc 

Roll  rate 

3 

dc 

Yaw  rate 

3 

dc 

Axial  acceleration 

3 

dc 

Lateral  acceleration 

3 

dc 

Normal  acceleration 

3 

dc 

Pitch  center  stick  position 

3 

ac  LVDT 

Roll  center  stick  position 

3 

ac  LVDT 

Pitch  side  stick  force 

3 

ac  LVUT 

Roll  side  stick  force 

3 

ac  LVDT 

Rudder  pedal  position 

3 

ac  LVDT 

Angle  of  attack  

2 

dc 

Angle  of  sideslip 

1 

dc 

Horizontal  stabilizer  actuator  position  . 

3 

dc  potentiometer 

Surface  positions  (5) 

1 

dc  potentiometer 

Pitch  attitude 

2 

Synchro 

Roll  attitude 

2 

Synchro 

Heading  angle 

2 

Synchro 

Wing  position 

3 

dc  potentiometer 

Mach  number 

2 

dc  potentiometer 

Altitude 

2 

Serial  digital 

Computer  temperature 

1 

dc 

TABLE  2.-tNPtfT  IKSCRETES 


Discrete 

Redundancy 

level 

Mo^e  select 

3 

Autopilot  panel 

2 

Annunciator  reset 

2 

Trim  

2 

Sidestick  enable 

3 

CIP  digits 

1 

CIP  enter/clear 

2 

Winjf  up/down 

3 

Weight  on  wheels 

1 

Gain  switches 

1 

Gear  position  

3 

Autopilot  disconnect  .... 

1 

Computer  bypass  status  . . . 

3 

I 


TABLE  3. -ACTIVATOR  CHARACTERISTICS 


Bandwidth , 
rad /sec 

Slew  rate 

Hysteresis . 
percent  of 
full  stroke 

Force  or  torque 
output 

Stroke 

Secondary  actuators 

125.0 

30  cm /sec 

0.2 

1 0 . 000  S 

♦2.5  cm 

Power  actuators 

Pitch 

12.5 

25  deg/ sec 

0 1 

3.nno  N m 

‘6.75*».  -26  5® 

Roll 

30.0 

70  deg/sec 

0. 1 

I .000  N m 

♦45®.  -15® 

YAW 

25.0 

120  deg/ sec 

n 1 

356  N m 

•21» 

:m3 


TABl.K  4.-l)ir,ITAL  CONTROL 

Central  processor  unit 

Number  system 

Operation 

Fixed-point  data 

Floating  point  data 

Typical  execution  times. 

reg’ister  to  register,  psec  — 

Fixed  point; 

Addition 

Multiplication 

Division 

Floatintf  poitd; 

Addition 

Multiplication 

Division  

Average  instruction  rate,  per  sec  . . . 

Main  storage  — 

Type 

Cycle  time . nsec 

Addressable  unit  

Features 

Input-output  — 

Type 

Maximum  data  rate,  per  sec 

Discretes 

External  interrupts 

Physical  characteristics  — 

Weight  .kg 

3 

Volume . m 

Power,  W 

Environment 


.TFR  CHARACTERISTICS 


Binary,  fixed  point,  two's  complement . 

fractional  hexadt*cimal  floating  point 
Full  parallel 

16  and  32  bits,  including  sign 
32  bits  (24  bit  mantissa)  and  64  bits 
(56  bit  mantissa) 


1.0 

4,8 

8.4 

2.4 

5.0 
10.5 

480.000 


Random  access,  destructive  readout. 

ferrite  magnetic  core,  nonvolatile 
900  read/write 
16-bit  halfword 

Parity  and  store  protect  on  halfword 


Parallel  halfword  plus  parity,  multiplex. 

half  duplex 
225.000  full  words 
Four  input,  four  output 
Five  levels 


24  (32.000'word  memory) 
0.025 

375  (32. 000-word  memory) 
MIL-F  5400  Class  2X 


TABLE  5. -SOFTWARE  MEMORY  ALl/K'ATION 


'lemory  . 
full  words 


0 

Data 

" 

Operating  system  and  computer 

4 

- 

redundancy  management 

6 

- 

Control  laws 

8 

1 11 

12 



Sensor  redundancy  m.snagement 

14 

- 

Preflight  fe.st  program 

16 

- 

Ground  display  and  load 

18 

— 

programs 

20 

- 

2d 

- 

I'nused 

/-  ('urrent  configuration 

/ (24.576  words) 

26 

- 

28 

hi 

12 

- 

f-  Expansion  capability 
/ ( 32 .768  words) 

/ 

J 

11  Id’ 


I 
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TABLE  6.-ERIMARY  TO  CBS  SWITCHING  LOGIC 


Event 


Comparator  CA  reports  failure 
Comparators  CA  and  CB  report  failure 
Channel  C digital  failure 
Channels  B and  C digital  failure 
Piloi  selects  CBS 


Figure  1.  F-8  digital  fly-by-wire  aircraft 


Digital 

computers 


Surface  commands 


Interface 


Sensors  and 
pilot  commands 


Servodrive 

electronics 


Switch 


Secondary 

actuators 


Computer 


Power  actuators 


commands 


F-8  DFBW  control  system  mechaniralion 


Switch  state 

Switch  A 

Switch  B 

Switch  C 

Pallet  assembly  - computer 
and  interface  unit 


Autopilot 


Secondary  actuators  (51 


Computer  input 
panel 


Encoder'decoder 


Computer  bypass  system 
and  servodrive  electronics 


Sensor  pallet  - gyros 
and  accelerometers 


Figure  3.  F-8  DFBW  hardware  elements 


Annunciator  Dane' 


Conputer  inout  MOfi 


Autopilot  panel 


Figure  4.  Cockpit  panels 


decoder 


Figure  5.  Primao'  digital  system  mechanization. 


Ik.... LJ. 


Figure  6.  Interface  unit  microprogram  flow. 


21-18 


Analog 

discretes 


Cockpit 

panel 

discretes 


Figure  7.  Input  data  flow  for  channel  A . 


f Figure  8.  Functional  diagram  of  fault  detection  hardware  in  7FU  (shown  for  channel  A). 

I 
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Interrupl  processor 
Computer  synchronization 
Crosslink  and  computer  RM 
Executive  scheduler 


I nput  data  read 


P Output  to  actuators  and  displays 


r 


C I P processor 


Data 

RM-l: 

Control 

Control 

RM-2: 

recording 

1 Computer  self-test 

select  signal 

1 1 1 

law  1 

^ L 

law  2 

_J 

fault  detection 
1 1 .1 

processor 
^ 1 

1 ^ ^ ^ 

9 10  11 

Time,  msec 


12  13  14  15  16  17  18  19  20 


Figure  9.  Software  sequence  and  timing  during  one  minor  cycle . Three 
channels,  direct  modes. 


Figure  10.  Synchronization  discretes. 


Signal  select 


Figure  II.  Triplex  analog  sensor  redundancy  management  algorithm  . 


r 


Triplex 

discrete 

inputs  Masked 


Figure  12.  Triplex  discrete  RM  algorithm. 


Figure  13.  Schematic  diagram  of  single  channel  of  secondary  actuator 
and  servoelectronics . 
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0|^.  deg 


0.002  «VTAS 


1 

0.42s  ■+  1 


0.4 


Pitch  rate,  deq/sec 

5.8 

q 

0.42s 

1 __  _ _ J 

deq/sec  To  CAS  signal 

0.42s  + 1 

1 ' ^ " selection  logic 

Figure  15.  Angle-of-attack  limiter. 


To  CAS 
mode 


Symmetrical 
aileron  (flap) 
command,  deq 


Figure  16.  Direct  lift  mode. 


Roll  rate,  deq/sec 


Inteqral  plus 
bypass 


Aileron 
command,  deq 


Rudder 

command,  deq 


Figure  17.  Lateral -directional  SAS  modes. 
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Figure  18.  F-8  DFBW  remote  augmentation  vehicle  (RAV)  concept. 


F-8  airframe 


General  purpose  central 
digital  computer  - 

Equations  of  motion 
Sensor  models 


Actuator 

commands 


t]  Pallet  assembly 


Conditioned 
sensor  signals 


Surface  positions 


Sensor  outputs 


Throttle  inputs  and 
surface  positions 


Simulation 
interface  unit 


Figure  19.  Iron  bird  facilify 
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Figure  20.  Simulated  sensor  wave  /'orms. 


Surface  positions 
and  throttle 
comniands  Irom 
Iron  Bird 


Single  sensor  outputs 
to  Iron  Bird 


Redundant  sensor  outputs 
to  Iron  Bird 


Figure  21.  Integration  of  redundant  sensor  models  in  real-time  simulation. 


40wsec  Mditionti  li>ne 
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(a)  Normal. 


(b)  Computer  B late. 


Figure  23.  Synchronization  performance . 


Channel  A 
horizontal  stabilizer 
command,  V 


Channel  B 
horizontal  stabilizer 
command,  V 


Channel  C 
horizontal  stabilizer 
command,  V 


Power 


Figure  24.  Power  .'ihuulown  and  recovery  sequence. 
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Figure  25.  Restarts  induced  by  interruptions  in  power. 


Pilot  stick 
position, 
cm 
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(a)  DIRECT  mode. 


Figure  26.  Longitudinal  response  of  the  simulated  F~8  DFBIV  aircraft 
Mach  0.6,  altitude,  6100  m. 


Figure  26.  Continued . 
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L-1011  FLIGHT  CONTROL  SYSTEM 
by 

J.A.  Flapper,  Research  and  Development  Engineer 
E.O.  Throndsen,  Research  and  Development  Engineer 
Lockheed-CaUfomia  Company 
P.O.  Box  551 
Burbank,  California 


SUMMARY 

This  chapter  describes  those  aspects  of  the  L-1011  flight  controls  - primary  and  automatic  - which  are  of  interest  because  of  the 
state  of  the  art  advancements  and  the  improvements  which  they  represent.  The  flying  tail  primary  control  system,  its  rationale  and 
design  features  is  dealt  with  in  some  depth.  Integration  of  primary  controls  with  the  automatic  flight  controls  is  treated  and  the  direct 
lift  control  system  and  roll  control  briefly  described.  The  automatic  controls  are  described  with  emphasis  on  the  yaw  stability  augmen- 
tation system  and  the  automatic  landing  system.  The  former  is  in  concept  an  “active  control  system”  in  that  design  loads  are  predicated 
on  its  availability.  The  latter,  for  the  final  stage  of  landing  in  Category  III  conditions,  is  the  forerunner  of  the  “fly-by-wire”  concept 
for  commercial  transports.  Again,  system  rationale  or  design  features  which  enhance  safety  and  reliability  arc  treated. 


i. 


INTRODUCTION  ’ 

The  L-101 1 TriStar,  one  of  the  current  generation  of  wide-body  commercial  transports,  is  a short  to  medium  range  airplane  that 
cruises  typically  at  33,000  feet  and  .85  Mach.  Maximum  takeoff  and  landing  weights  arc  around  470,000  and  360,000  pounds.  Depend-  ' 

ing  on  accommodations,  it  can  provide  space  for  up  to  400  passengers  plus  a crew  of  13.  Figure  1 gives  the  airplane  dimensions. 

The  Tri-Star  has  been  in  airline  operation  since  initial  certification  by  the  FAA  in  April  1972.  It  has  been  subsequently  certificated  by 
Canada  (MOT),  Great  Britain  (CAA),  Japan  (JCAB),  West  Germany  (LBA),  Saudi  Arabia  (DGCA)  and  Hong  Kong  (CAD).  With  over 

1 30  airplanes  presently  in  service,  there  have  been  about  700,000  revenue  flight  hours  accumulated  to  date  (4-1  /2  years  after  initial  " 

certification)  at  a rate  of  approximately  1.9  hours  per  revenue  flight.  j 

Although  the  U 101 1 does  not  extend  its  flight  envelope  beyond  the  limits  of  precedent  commercial  subsonic  jet  aircraft,  the  higlier 
performance  and  safety  demands  plus  the  size  of  the  aircraft  and  the  availability  of  more  advanced  technology  items  provided  an  oppor-  • 

tunity  to  take  a fresh  look  at  control  system  concepts.  The  result  has  been  that  many  advances  and  improvements  have  been  incorporated 
into  both  primary  and  automatic  flight  control  systems  in  the  TriStar.  It  is  intended  in  the  following  discussion  to  describe  some  high- 
lights of  these  systems  and  features  which  relate  to  improvements  in  safety , perfonnance  and  overall  reliability. 

The  Primary  Flight  Control  section  discusses  some  special  requirements  and  then  describes  aspects  of  the  system  implementation 
which  to  a large  extent  are  unique  in  the  L-101 1 fliglit  control  system.  Tlie  pitch  control  system,  being  the  most  unconventional  one, 

partly  because  of  the  flying  tail  concept,  is  treated  in  some  detail.  Discussed  are  the  general  system  configuration  and  special  features  of  ] 

the  horizontal  stabilizer  power  control  system  and  the  input  system.  The  integration  of  the  pilot  input  drive  with  the  trim  system  is  of  j 

particular  interest  The  pitch  autopilot  servo  system  is  also  discussed  in  some  detail.  In  addition  some  features  of  the  Direct  Lift  Control  j 

(DLC)  system  are  described,  with  the  emphasis  on  the  DLC  servo  system.  The  roll  and  yaw  control  systems  are  only  briefly  treated.  j 

The  automatic  flight  control  subsystems  of  the  L-IOI  I Avionic  Flight  Control  System  arc  highly  redundant  in  comparison  to  such  { 

systems  of  the  previous  generation  of  aircraft.  This  redundancy  resulted  from  the  need  for  high  integrity  in  the  Category  III  Automatic 
Landing  System  and  the  redundant  configurations  of  the  “cruise”  portions  of  the  autopilot  and  yaw  control  channels  were  also  affected 

by  this  Category  HI  requirement.  An  overview  of  the  overall  system  arrangement  and  operational  features  is  given  along  with  the  physical  i 

characteristics  of  the  significant  components.  The  overview  is  followed  by  a discussion  of  the  principal  features  of  the  yaw  stability  aug-  ; 

mentation  system,  the  “cruise”  autopilot  and  the  automatic  landing  system.  In  each  of  these  three  sections  design  criteria  and  basic  i 

integrity  related  requirements  are  considered  as  well  as  the  mechanization  schemes  and  design  features  incori)oratcd. 

PRIMARY  FLIGHT  CONTROLS 

System  Requirements  j 

The  present  generation  of  wide-bodied  airplanes  has  not  materially  expanded  the  flight  envelope  of  the  previous  commercial  jet  fleet  ] 

and  in  that  respect  the  functional  performance  requirements  for  the  L-101 1 primary  flight  controls  did  not  change  significantly  from 
those  of  previous  commercial  jets.  However,  the  public  safety  demands  which  greatly  increased  over  the  last  decade,  partially  as  a result 
of  a large  surge  in  airplane  travel  and  partially  as  the  result  of  increased  public  consciousness  of  safety  in  transportation,  demanded  a 
sigruficant  improvement  in  system  integrity.  The  increase  in  size  of  the  new  generation  of  airplanes  and  the  resulting  increase  in  potential 
magnitude  of  accident  effects  served  only  to  emphasize  the  need  for  speedy  definition  of  new  standards  and  for  their  immediate  imple- 
mentation. Accordingly  the  U.S.  Federal  Aviation  Agency  (FAA)  initially  issued  “Special  C onditions"  which  were  to  be  considered  as  an 
integral  part  of  the  Regulations  during  the  development  of  the  first  wide-bodied  airplanes. 

While  the  new  standards  of  safety  permeate  all  aspects  of  airplane  design  they  probably  had  the  most  visible  effect  on  the  design  of 
the  flight  control  systems.  The  effect  on  these  systems  was  even  more  emphasized  by  the  fact  that  electronics  had  developed  to  the  point  I 

where  automatic  landing  became  a practical  reality  and  alFweather  landing  capability  requirements  added  further  depth  to  flight  control  ‘ 

system  integrity  requirements.  Application  of  statistical  failure  probability  theory  to  determine  flight  control  system  integrity  was  very 
much  in  its  infancy  during  the  initial  development  of  these  aircraft  and  therefore  the  FAA  rules  were  expressed  in  more  general  language. 

However  the  rule  was  very  .specific  with  rc.spcct  to  the  requirement  to  retain  airplane  controllability  after  any  single  mechanical  failure 
(continued  safe  flight  and  landing)  and  no  exceptions  hereon  were  allowed  however  improbabl  the  failure.  The  rules  with  respect  to 
system  jams  were  more  subjective  and  allowed  the  designer  to  prove  jam  allcriation  capability  or  extreme  improbability.  A similar  nile 
was  established  for  multiple  failures,  where  again  retaining  control  capability  or  extreme  improbability  of  the  failure  combination  had  to 
be  proved.  At  that  time  no  generally  accepted  definition  of  "extreme  improbability"  was  established  and  the  developer  had  to  use  con- 
servative judgement  in  applying  the  rules.  Some  parenthetical  examples  shown  in  the  “Special  Uonditions"  were  used  as  guide  lines 

(e.g.  any  single  mechanical  failure  in  combination  with  a hydraulic  or  electrical  failurel.  Later  diinng  the  development  the  interpretation  i 

of  these  rules  became  more  specific,  particularly  with  respect  to  correlation  between  failures,  secondary  Lailure  effects  and  also  considera- 
tion of  higher  probability  failures  which  could  make  consideration  of  different  failure  combinations  necessary  . Furthennorc.  ways  ol 

considering  dormant  or  passive  failures  became  better  defined  by  gradual  acceptance  of  mathematical  probability  methods.  J 


[ 

[ 

i 


Where  these  regulations  set  the  basic  safety  requirements,  actual  design  integrity  requirements  were  obviously  determined  by  the 
severity  of  the  potential  failure  effect  and  this  in  turn  depended  on  the  overall  airplane  control  configuration.  Some  features  inherent 
in  the  basic  airplane  concept  had  a profound  effect  on  control  system  integrity  requirements.  The  most  important  ones  were  the  size 
of  the  airplane,  the  flying  tail  concept  and  the  requirement  for  CAT  111  automatic  landing  capability. 


• The  size  made  direct  manual  control  impossible  even  with  the  most  advanced  aerodynamic  control  techniques  and  even  when 
considered  for  emergency  control  only.  This  made  fully  powered  systems  mandatory  and  for  the  first  time  dictated  availability 
of  power  under  all  failure  mode  conditions  including  all  engines  out. 

• The  flying  tail  concept,  by  providing  a single  basic  pitch  control  surface  (as  compared  with  conventional  independent  control 
of  elevators  and  horizontal  stabilizer)  mandates  a driving  system  of  extreme  reliability. 

• All  weather  automatic  landing  capability  makes  it  necessary  that  in  the  final  stages  of  the  approach  and  landing  the  overall 
primary  flight  control  system  have  fail-operative  capability  with  a minimum  of  pilot  attention  or  need  for  corrective  action. 

System  Configuration  and  Implementation 


Figure  2 shows  the  general  control  surfaces  arrangement.  In  designing  the  system  configuration  a conscious  effort  was  made  to 
achieve  inherent  system  integrity  with  a minimum  of  reliance  on  additions  which  themselves  introduce  new  failure  modes.  Specifically 
in  the  pitch  control  system  this  has  led  to  some  unconventional  design  characteristics. 

PITCH  CONTROL  SYSTEM.  The  “flying  tail”  concept  for  pitch  control  was  selected  because  of  its  operational  and  aerodynamic 
control  advantages.  It  requires  summing  of  short  term  pilot  and  long  term  trim  inputs  at  the  input  level  providing  one  integrated  input 
command  to  the  single  pitch  control  surface.  It.  thereby,  prevents  the  problem  of  the  pilot  column  and  the  trim  system  generating  con- 
tradictory commands  at  the  aerodynamic  control  level  and  inherently  .solves  the  potentially  dangerous  "pitch  upsi’t"  and  “mistrini  at 
take-off’  problems  associated  with  the  conventional  separate  elevator  and  horizontal  stabilizer  pitch  control  systems.  However,  this  same 
absence  of  multiple  control  surfaces,  demands  a high  degree  of  integrity  of  the  pitch  control  system,  particularly  when  required  to  meet 
the  contemporary  safety  requirements.  Nevertheless,  after  many  trade-off  studies  it  became  clear  that  the  direct  approach  of  providing  a 
single  stabilizer  control  system  with  a very  high  degree  of  system  integrity  would  offer  the  best  and  salest  solution.  This  required  a 
system  capable  of  absorbing  all  conceivable  single  component  failures  and  a variety  of  multiple  failures,  without  becoming  inoperative 
and  without  introducing  inadvertent  hazardous  pitch  commands.  The  following  paragraphs  describe  the  system  configuration  with  a 
short  explanation  of  the  underlying  rationale.  A schematic  diagram  of  the  pitch  control  system  is  shown  in  Figure  3. 

To  make  full  use  of  the  potential  of  the  flying  tail  control  concept  the  system  had  to  be  configured  to  give  the  pilot  full  stabilizer 
control  through  the  column  independent  of  trim  input.  This  requires  that  the  column  input  can  override  or  negate  any  trim  input  and 

still  provide  full  control  An  obvious  way  would  be  to  provide  10091  parallel  trim  as  is  often  done  in  roll  and  yaw  control  systems.  This 

means  that  there  would  be  a fixed  relationship  between  column  and  stabilizer  and  that  the  trim  system  would  only  control  the  zero  force 
datum  point  of  the  artificial  feel  system.  The  pilot  could  then  control  the  stabilizer  to  any  position  by  applying  sufficient  force  to  over- 
ride the  feel  system.  However,  some  other  constraints  on  the  system  make  this  unacceptable.  The  integrated  control  system,  including 
the  trim  and  feel  subsystems  had  to  comply  with  the  following  objectives: 

• Provide  sufficient  control  capability  through  the  column  to  enable  the  pilot  to  take  off.  fly  or  land  the  airplane  with  maximum 
mis  trim. 

• Allow  a maximum  variation  in  zero  force  column  position  of  3.8  inches  (determined  by  flight  simulation  testsl. 

• Provide  a minimum  column  to  surface  position  gain  of  I inch  per  g. 

• Provide  artificial  feel  forces  to  provide  a minimum  of  33  Ihs/g  in  cruise  and  50  Ibs/g  with  flaps  down. 

• Keep  the  variation  in  stick  force  per  g as  a function  of  c.g.  and  flight  condition  to  a minimum. 

Complying  with  all  these  objectives  resulted  in  a non-linear  relationship  between  column  and  surface  (J-curve)  with  a 50-50  mixture  of 
parallel  and  series  trim  and  an  artificial  feel  force  conttolled  as  function  of  trim  angle  and  Mach  number.  Figure  4 shows  the  non-linear 
relation.ship  curve  and  the  horizontal  shift  of  that  curve  as  a result  of  the  scries  trim  input.  The  “Trim  Line”  shown  in  the  figure  depicts 
the  column  to  surface  relationship  for  zero  column  force.  The  relationship  between  column  and  surface  for  each  trim  position  can  be 
represented  by  a line  through  the  trim  point  horizontally  parallel  to  the  B and  A lines.  These  lines  show  the  relationship  for  rmnimum 
(0°  6 H)  and  maximum  (10°  &H)  stabilizer  trim  angles  respectively,  fhis  non-linear  relationship  and  the  way  the  trim  system  interfaces 
with  the  main  pitch  control  system  has  the  added  advantage  that  no  special  measures  were  required  to  guard  against  a trim  ninawav’. 

One  fixed  maximum  trim  motor  speed  provides  automatically  the  maximup'  r-quired  and  maximum  allowable  trim  rate  for  all  flight 
conditions. 

In  contrast  to  all  other  commercial  airplanes  the  L-101 1 trim  system  provides  a rate/position  control  system  for  the  flight  ca-w  via 
trim  knobs  on  the  columns.  Since  the  actual  rotation  of  this  knob  generates  the  input  command  no  trim  switches  arc  involved  and  a 
trim  runaway  can  not  be  caused  by  a faulty  trim  switch.  Oia'ct  mechanical  trim  control  is  available  by  means  of  the  conventional  trim 
wheels  on  the  center  console. 

After  the  non-hnearizer/trim  interface  had  provided  a superior  pitch  control  configuration  from  the  point  of  view  of  basic  control 
qualities,  the  task  remained  to  implement  this  configuration  in  order  to  give  it  the  necessary  operational  integnty.  The  following  aspects 
had  to  be  considered: 

• Provide  sufficient  hydraulic  and  mechanical  redundancy  to  rennain  fully  oiwrative  after  multiple  hvdraulic  failures  (loss  of 
3 systems),  mechanical  failures  such  as  link  breakage,  cable  failure,  loss  of  center  support  of  cranks  or  combinations  thereol 

• Retain  operational  capability  in  case  of  any  conceivable  system  jam.  Tliis  includes  the  surface, 'power  servo  system  loops  as 
well  as  the  input  system. 


It  is  not  intended  to  mention  here  all  the  design  details  that  were  incorporated  to  enhance  the  total  system  integrity,  ('niy  special  fea- 
tures will  be  dlscus.sed  and  in  the  next  paragraphs  the  more  important  component  details  will  be  touched  on  in  sufficient  depth  to  give 
an  impression  of  the  need  for  detail  design  considerations  neces.sary  to  meet  the  safety  standards. 
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Stabilizer  Power  Servo  System.  A quadruple  power  servo  system  was  considered  essential  Apart  from  providing  a high  degree  of 
power  redundancy  in  case  of  loss  of  multiple  servo  channels  or  their  power  sources.e.g  as  the  result  of  an  engine  explosion,  collision  or 
criminally-induced  structural  damage,  it  also  offered  the  potential  of  a reliable  jam-breaking  capabiUty  in  case  of  a linear  hydraulic  actua- 
tor jam.  Each  of  the  four  servo  channels  is  powered  by  a separate  and  independent  hydraulic  power  source  and  each  servo  loop  by  itself 
IS  able  to  control  the  airplane.  The  system  is  split  up  in  two  dual  units  (each  consisting  of  two  linear  actuators  and  one  servo  valve  unit 
with  feedbacks)  located  approximately  eight  feet  apart,  which  drive  in  unison  the  center  section  of  the  horizontal  stabilizer.  The  physical 
separation  was  considered  necessary  to  provide  protection  against  localized  destructive  forces  (explosion  or  fire)  but  posed  the  problem 
of  synchronizing  the  inputs  to  the  servo  units.  This  problem  was  somewhat  facilitated  by  holding  the  stiffness  of  the  stabilizer  section  to 
a maximum  of  about  30,000  Ibs/in,  which  allowed  some  mismatch  in  the  servo  position  outputs  without  building  up  excessive  force  dif- 
ferences between  them.  Each  servo  unit  contains  the  necessary  valving  and  accessories  of  two  servo  loops.  The  servo  units  are  designed 
in  a way  that  the  rupture  of  a pressure  vessel  cannot  result  in  the  loss  of  the  fluid  of  more  than  one  hydraulic  system,  each  of  which  is 
contained  in  a separate  manifold  (this  feature  is  applied  to  all  other  hydraulically  redundant  components).  The  main  control  valve  is  a 
tandem  valve  controlling  two  servo  loops,  whose  actuators  are  installed  side  by  side.  The  output  path  of  each  actuator  incorporates  a 
very  accurately  manufactured  shear  pin  at  the  lower  connection  which  is  capable  of  supporting  safely  the  maximum  output  of  a single 
actuator,  but  which  can  be  reliably  sheared  off  by  the  force  of  three  actuators  when  one  actuator  is  mechanically  or  hydraulically  locked. 
The  actuators  are  installed  in  pairs  near  their  servo  valve  units.  Both  actuators  of  a pair  are  mechanically  connected  to  each  other  to  force 
a disconnected  actuator  to  retain  its  orientation  in  the  airplane  to  prevent  major  secondary  damage.  The  interconnecting  linkage  is  strong 
enough  to  support  the  load  of  an  actuator  if  the  upper  attachment  is  lost  (e.g  loss  of  the  pin).  The  two  main  bearings  of  the  stabilizer 
have  been  designed  to  provide  two  additional  concentric  rotating  bearings  to  allow  stabilizer  motion  without  damage  to  the  bearing  struc- 
ture in  case  the  main  beating  or  even  the  main  and  secondary  bearings  freeze  mechanically.  The  stabilizer  mechanism  is  further  shielded 
from  large  objects  which  could  lodge  between  the  structure  and  the  stabilizer,  while  smaller  parts  will  be  easily  crushed  or  sheared  by  the 
tremendous  torque  output  capability  of  the  servo  system  (approx  70.000  ft.  lbs)  without  inflicting  major  damage  to  tail  or  stabilizer 
structure. 

The  system  is  protected  against  a servo  valve  jam  (which  affects  both  servo  channels  at  one  side)  by  a self  monitoring  hydromechanical 
system.  A force  on  the  valve  of  over  100  lbs  will  activate  this  system  by  displacing  a spool  inside  the  main  control  valve  spool  This  trig- 
gers a chain  of  hydromechanical  events  which  closes  the  mam  pressure  shut-off  valve,  places  the  actuator  in  a bypass  position,  operates 
a lock  valve  which  in  turn  latches  this  condition  (to  prevent  reengagement  when  the  force  on  the  valve  disappears)  and  through  a mechani- 
cal interconnect  disables  the  shut  off  mechanism  of  the  two  remaining  operative  servo  channels  at  the  other  side.  This  last  feature  prevents 
the  automatic  system  or  the  pilot  from  ever  inadvertently  deactivating  the  total  system.  The  relative  spool  displacement  can  be  caused  by 
a force  on  the  input  arm  relative  to  the  (dual)  feedback  arms  but  the  same  thing  happens  when  the  two  feedback  arms  arc  desynchronized 
as  the  result  of  a failure  (breaking  or  bending  of  an  arm.  etc).  Since  this  monitoring  system  Ls  normally  always  dormant,  means  are  provided 
to  introduce  an  artificial  valve  stop  and  verify  proper  system  operation.  This  stop  can  not  inhibit  valve  motion  when  the  servo  is  pressurc 
ized  and  a special  procedure  Is  required  to  make  it  operative.  This  procedure  can  not  be  applied  in  flight  hut  as  an  additional  protection 
the  stop  is  introduced  well  beyond  the  normal  valve  control  stroke  in  the  overtravel  range.  By  a periodic  check  proper  operation  of  the 
valve  jam  protection  mechanism  as  well  as  the  protective  servo  interlock  mechanism  are  verified. 

A special  hydromechanical  lead-lag  network  is  incorporated  in  each  servo  loop  to  improve  the  stability  of  the  servo  system,  which 
drives  the  heavy  horizontal  stabilizer  (with  very  low  frequency  structural  modes)  by.  for  this  mass. relatively  small  actuators.  However, 
properly  phased  structural  feedback  effects  and  valve  damping  make  the  system  inherently  stable  even  in  the  worst  failure  modes  and 
therefore  special  redundancy  and/or  check  out  mechanisms  for  these  networks  were  unnecessary. 

Stabilizer  Servo  Input  System.  The  stabilizer  servo  system  is  driven  from  conventional  dual  column  and  dual  cable  systems.  The 
cable  systems  nin  at  opposite  sides  of  the  fuselage  and  after  penetration  of  the  aft  pressure  bulkhead  and  attachment  to  the  feel  system, 
are  coupled  together  by  an  aft-synchronizing  mechanism.  At  both  sides  of  this  coupling  mechanism  the  scries  trim  input  is  summed  with 
the  pilot  input  to  drive  the  input  arms  of  both  servo  modules  via  separate  non-linearizers.  Because  of  the  importance  of  keeping  the 
four  channels  of  the  servo  system  synchronized,  the  total  mechanism  connecting  the  two  valve  modules  is  very  rigid  and  mechanically 
dual  redundant  This  includes  the  series  trim  system  mechanism  itself  which  must  provide  the  reaction  force  in  the  summing  link. 

Providing  mechanical  redundancy  to  protect  against  mechanical  failure  is  relatively  simple  but  it  is  much  more  involved  to  keep  a 
single  system  operative  in  case  of  a mechanical  jam.  Since  protection  against  inadvertent  automatic  disengagement  of  a self  correcting 
input  jam  conection  system  became  very  involved  and  since  in  this  case  the  requirements  could  be  met  with  a pilot  w arning,  automatic 
instruction,  and  a manual  correction  system,  this  last  method  was  selected.  The  warning  system  derives  its  intelligence  (to  determine  loca- 
tion of  jam)  from  three  two-way  bungees,  one  in  the  front  of  each  cable  path  and  one  in  the  servo  synchronizing  link.  Tliese  are  actuated 
by  a liigher  than  normal  pilot  force.  In  most  cases  of  a jam  in  the  input  system,  control  is  still  possible  through  the  series  trim  component 
of  the  trim  system.  Only  a jam  in  the  coupling  mechanism  between  the  power  servos  requires  pilot  action  before  significant  pitch  control 
inputs  can  be  made.  In  that  case  initial  corrective  action  by  the  pilot  consists  of  deactivating  two  stabilizer  servo  channels  as  instructed 
by  the  detection  and  warning  system.  This  action  simultaneously  and  automatically  separates  the  two  servo  inputs  by  means  of  a decoupler 
in  the  servo  synchronizing  linkage  allowing  immediate  control  from  either  column,  but  requiring  oveniding  one  of  the  front  bungees. 
However  a second  instruction  appears  at  that  point  to  open  a coupler  between  the  two  columms.  After  decoupling,  normal  control  is 
restored  with  normal  trim  but  with  one  half  of  the  feel  force  and  from  one  column  only. 

Ttie  artificial  feel  forces  are  generated  by  multiple  mechanical  leaf  springs.  Tlicre  arc  two  units,  each  one  directly  connected  to  one 
of  the  cable  systems  upstream  of  the  servo  synchronizing  mechanism  and  each  one  incorporating  three  leaf  springs.  The  mechanical  advan- 
tage from  column  to  spring  is  determined  by  a variable  gain  mechanism.  The  gain  of  the  mechanism  is  mechanically  controlled  by  the 
stabilizer  series  trim  input  and  in  addition  is  modified  as  function  of  airplane  Mach  number  by  means  of  electric  motors.  Figure  .‘i  shows 
the  force  fan  curve.  The  mechanism  is  designed  so  that  in  case  of  complete  failure  of  either  the  scries  trim  system  or  the  Mach  feel  drive 
system,  the  feel  forces  for  landing  will  always  arrive  at  approximately  the  nominal  value,  providing  inherent  lad  operational  s\  stem  design. 
The  Mach  feel  system  is  dual  redundant  with  only  a single  system  needed  for  operation.  The  outputs  of  both  motors  are  summed  in  a 
differential  gearing  after  each  one  has  been  rendered  irreversible  through  a worm  gear  for  isolation.  Tlie  feel  and  trim  s\ stems  arc  integrated 
into  each  of  two  units,  coupled  across  the  ship  by  drive  shafts.  However,  the  series  trim  drive  is  completely  integrated  into  one  of  the  units 
and  will  always  provide  the  same  input  to  both  servo  units  in  contrast  to  the  parallel  trim  which  can  operate  differently  on  one  feel  unit 
if  the  two  “trim  and  feef  units  become  separated  by  failure  of  the  connecting  drive  shaft. 

I’itch  Autopilot  Servo.  Until  the  advent  of  automatic  landing  systems  the  only  rc-liability  requirement  of  autopilot  servos  was  that  it 
be  fail-safe  in  all  conditions.  This  was  mainly  accomplished  by  restricting  their  force  output  authority  to  a safe  low  value  and  by  providing 
the  capability  for  pilot  override  at  any  time.  Hiis  was  based  on  the  as.sumption  that  the  flight  crew  would  always  be  in  a position  to  take 
over  the  piloting  of  the  aindane,  that  is.  the  autopilot  provided  only  a convenience  function.  With  the  advent  of  automatic  landing  the 
picture  changed.  Not  only  did  the  autopilot  need  a large  authority  but  in  addition  its  function  and  the  completion  of  its  task  became 
essentlil  under  some  circumstances.  Therefore  a new  concept  of  autopilot  servo  had  to  be  developed.  Based  on  previous  experience, 
l.ockheed  elected  to  slay  with  electrohydniulic  servos  driving  the  power  control  input  systems  in  a parallel  manner.  Because  of  the 
peculiar  configuration  of  the  horizontal  stabilizer  power  servo  system,  with  the  physically  separated  valve  modules,  the  conventional  way 
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of  integrating  a modulating  piston  servo  within  the  power  servo  unit  was  not  considered  applicable.  Without  complex  electronic 
synchronization  the  potential  differences  in  input  commands  could  lead  to  very  high  fatigue  loads  between  the  two  sets  of  power  servo 
actuators  and  could  adversely  affect  the  achievable  increment  control  For  this  reason  the  mechanical  autopilot  commands  resulting  from 
the  multiple  charmel  autopilot  servo  system  were  integrated  into  a single  mechanical  command  in  a separate  autopilot  servo  unit.  This 
mechanic^  command  output  is  introduced  in  the  main  power  control  system  in  the  servo  synchronizing  mechanism  thereby  assuring  a 
synchronized  command  to  the  powei  ^ervos. 

The  pitch  autopilot  servo  system  (refer  to  Figure  6)  consists  of  two  electrohydraulic  servo  channels  which  are  coupled  through  force 
limited  couplers.  To  allow  synchronization  errors  m the  individual  channels  and  mod  piston  positions  the  coupler  of  the  channel  desig- 
nated as  “secondary”  allows  a limited  motion  in  the  coupler  by  applying  half  the  force  of  the  basic  coupler.  Tiie  result  of  this  feature  is 
that  when  both  channels  are  operating  the  “coupled”  output  will  always  track  the  primary  mod  piston  exactly  while  allowing  a certain 
mistracking  of  the  secondary  mod  piston.  This  prevents  upsetting  the  ekctronic  in-line  monitoring  systems  and  allows  perfect  increment 
control  When  the  primary  system  fails,  or  the  autopilot  operates  on  the  secondary  channel  only,  the  limited  force/Umited  stroke  coupler 
of  that  channel  has  sufficiem  coupling  force  to  drive  the  output  as  a fixed  link.  The  coupled  mod  pistons  output  is  connected  to  the 
mechanical  system  outpui  via  an  “engage  anu  override  mechanism.”  This  couples  the  autopilot  system  with  the  primary  system  with  an 
accurately  controlled  and  variable  limited  transfer  force.  In  those  modes  where  the  autopilot  provides  the  conventional  convenience 
function,  the  authority  is  still  limited  to  a safe  value  (approximately  21  lbs  override  force  at  the  column)  but  in  the  approach  and  landing 
mode,  where  maneuvering  capability  is  required,  the  authority  is  increased  to  approximately  44  lbs.  The  engage  mechanism  is  designed  so 
that  the  force  is  practically  indep»  ndent  of  the  number  of  channels  in  operation.  The  coupling  force  is  hydraulically  generated  and  con- 
trolled by  two-value  pressure  controllers.  The  setting  of  these  controllers  is  again  hydraulically  controlled.  The  pressure  controllers  are 
backed  up  by  two-value  relief  valves.the  settings  of  which  are  switched  simultaneously  with  the  controllers.  Pressure  sensors  provide 
various  signals  to  the  monitors  and  the  pilot  to  indicate  the  status  of  the  system.  Multiple  solenoid  pressure  control  of  mod  pistons  and 
couplers  plus  the  series  action  of  mod  piston,  mod  piston  coupler  and  engage  coupler  release  capabilities  provide  a virtually  jam  proof 
system,  and  mechanical,  hydraulic  and  electrical  redundancies  provide  the  fail-operational  characteristics. 

DIRECT  LIFT  CONTROL  SYSTKM  (DLC~).  During  the  early  stages  of  the  development  of  the  automatic  landing  system  it  became 
clear  that  a Direct  Lift  Control  system,  implemented  by  control  of  some  of  the  wing  spoilers,  would  be  advantageous  in  reducing  the 
touchdown  point  dispersion  in  turbulent  air  conditions.  Therefore  the  primary  flight  control  system  specification  included  such  a system 
from  the  beginning. 

Since  wing  spoilers  are  used  for  air  brakes,  ground  lift  spoilers,  roll  augmentation,  and  direct  lift  control  all  these  functions  had  to  be 
integrated  in  the  various  systems  design.  In  Figure  7 a schematic  is  shown  of  the  mechanical  interface  of  these  systems.  Because  of  aero- 
dynamic characteristics  the  combination  of  the  four  most  inboard  spoilers  on  each  wing  were  best  suited  for  lift  control  with  minimum 
pitch  effect.  A separate  DLC  servo  system  has  b:en  developed  which  nonnally  acts  as  a pilot  assist  servo  for  the  speed  brake  system,  but 
also  accepts  the  electrical  commands  for  automatic  ground  spoiler  deployment  and  direct  uft  control.  The  servo  system  combines  two 
dual  redundant  hydromechanical  servo  channels  with  two  electrohydraulic  channels,  integrated  in  such  a way  that  a hydromechanical 
and  an  electrohydraulic  loop  share  the  same  actuator.  The  hydromechanical  servos  provide  the  speed  brake  pilot  assist  control.  The  tan- 
dem valves  are  normally  locked  in  neutral  during  DLC  operation.  However  a pilot  force  of  about  24  lbs  on  the  speed  brake  lever  over- 
rides that  valve  locking  mechanism.  The  flow  capability  of  the  mechanical  valve  is  twice  as  large  as  that  of  the  electrohydraulic  valve  so 
that  the  pilot  override  action  can  wash  out  the  effect  of  the  electrical  servo  loops  and  determine  the  servo  output  without  the  need  for 
immediate  disengagement  of  the  electrical  loop.  A switch  on  the  speed  brake  lever  enables  the  pilot  to  disable  the  DLC  servo  system  com- 
pletely. In  that  case  the  pilot  assist  as  well  as  electrical  command  modes  an?  deactivated,  but  direct  mechanical  “flow  through”  command 
to  the  spoiler  power  servos  is  still  possible. 

During  operation  in  the  DLC  mode,  a spoiler  deflection  authority  of  plus  or  minus  8 degrees  around  an  8 degrees  spoiler-up  bias  is 
required  and  the  airplane  is  protected  against  larger  inputs  by  DLC  servo  actuator  stroke  limiters.  These  limiters  are  hydraulic  pistons 
activated  during  DLC  operation.  They  are  deactivated  for  automatic  ground  spoiler  deployment  by  a positive  signal  generated  after  the 
necessary  interlock  requirements  have  been  satisfied.  Otherwise  only  a pilot  override  action  can  remove  the  stroke  limiter. 

Figure  8 shows  the  interlock  diagram  for  a single  channel  of  the  DLC.  Tliis  circuit  will  automatically  acthate  the  direct  lift  control 
system  in  the  approach  (throttles  retarded,  flaps  down  etc.)  and  also  controls  the  automatic  deployment  of  the  spoilers  after  touchdown. 

In  addition  it  provides  the  multiple  safeguards  against  inadvertent  spoiler  deployment  in  fliglit,  prevents  full  deployment  ot  the  spoiler* 
until  both  wheels  are  on  the  ground  (limiter  control)  and  deactivates  DLC  automatically  when  required  b\  flight  condition  (stall  warning 
or  go-aroundX  Furthermore  the  in-line  monitors  deactivate  one  (or  both)  channel(s)  w hen  errors  in  the  position  modulation  commands 
are  detected.  The  flight  crew  has  the  capability  to  permanently  deactivate  the  system(s)  by  means  of  control  switches  on  the  overhead 
Right  Control  Electronics  Systems  panel  These  switches  are  normally  in  the  “on”  position  during  the  w hole  flijdil. 

To  minimi'c  the  effect  of  a speedbrake  input  cable  failure,  which,  because  of  the  stored  energy,  may  cause  a spoiler-extend  com- 
mand, the  cable  system  is  separated  in  three  individual  cable  loops  in  scries.  Tim  reduces  the  energy  released  in  ease  of  a cable  break,  by 
a factor  of  three  and  prevents  overshoot  of  the  servo  input  beyond  the  static  position  entirely.  All  valves  and  spoiler  sent'  inputs  are 
spring-biased  t*>  cause  spoiler  retraction  in  case  of  physical  disengagement  from  the  system  and  spoiler  servos  are  controlled  and  disengaged 
in  pair»  (one  left,  one  right)  to  minimize  unwanted  roll  effects  in  case  of  failures, 

ROLl  AND  YAW  ( ON  1 ROL  SVSIl  VIS  I he  roll  and  yaw  control  sy  sicuiis.  are  more  coiucntumal  in  conI  iguralion  and  iherelore  are 
only  briefly  touched  upon  figure  ^ shows  the  conliguralion  layout  of  the  roll  c«>ntrol  system  I he  lelt  inboard  ailenm  scr\o  vysicni. 
with  lliree  irulividually  powercvl  servo  channels,  selves  norm.illv  as  piku-.issist  nctvo  the  complete  literal  control  sysicm.  driving  (lie 
inputs  to  the  left  outboard  and  right  inboard  aileron  mtvos  and  the  left  hand  mixer  mechanism,  w hich  controls  the  activation  ol  the  inputs 
to  the  number  4 left  and  right  spt>iler  sc'rvos.  Moreover  it  drives  the  right  cable  system,  which  is  ni>rmjlly  decoupled  In'in  the  control  wheels 
by  a lost  motion  device.  The  right  mht»artl  aileron  servo  in  turn  drives  the  inputs  to  the  right  outboard  ailenm  sc*rvo  and  the  right  liatul 
mixer  mechanism  controlling  tlie  number  2 and  spoiler  sctvos  input  activation,  I ett  and  right  scdectiu  meclunisnv.  shift  the  inputs  ot  the 
number  5 and  spoiler  servxis  between  the  outputs  the  inboard  aileron  sctvos  and  those  ot  the  number  4 spoilers.  I he  table  in  figure 
shows  the  schedule  of  activation  ol  all  spoilers.  I he  number  I spoilers  arc  never  used  tor  roll  augmenlation  and  are  iberelore  diiven  directly 
by  the  Dl  ( servo  However,  sirue  they  are  only  uscvl  for  direct  lilt  control  .nul  gri'und  spoilers  .»nd  tuU  lor  high  spe’ed  air  brakes  thev  are 
electrically  deactivated  m all  flight  conditions  unless  landing  flaps  are  selected  I he  ailen»n  .ind  spinier  sctvo  systems  are  powered  b\  tlie 
hydraulie  power  sources  m a way  which  requires  the  presc*nce  ol  only  three  hydraulu  systems  m any  one  wing  and  still  does  not  inlriHluce  a 
significan!  rolling  moment  when  hydraulic  power  ol  one  or  more  systems  is  suddenly  k'st  Ihis  again  pr«>iects  against  potential  loss  ot  all 
hydraulic  power  in  case*  ol  local  structural  damage. 


Tlie  dual  channel  electrohydraulic  roll  autopilot  servo  system  in  integrated  with  the  left  inboard  ailenm  power  seno  system  which 
dnves  the  total  lateral  control  system.  Although  mechanically  implemented  ijuile  dilferently  from  the  pitch  autopilot  servo  system,  it  is 


also  configured  on  the  principle  of  force  summing  of  the  outputs  of  both  channels,  with  an  allowance  for  mistracking  of  the  “secondary" 
channel  actuator  by  a limited  force/limited  stroke  override.  The  system  incorporates  an  hydromechanical  device  which  will  automatically 
deactivate  a servo  channel  in  case  of  a runaway,  hardover  or  output  jam. 

The  mechanical  input  system  incorporates  a torque  limiter  upstream  of  each  cable  system  which  allows  an  input  into  one  half  of  the 
system  when  the  other  half  is  jammed.  Each  limiter  incorporates  a sensor  which  when  activated  signals  the  general  location  of  the  problem 
by  lighting  instructions  and  aileron  and  spoiler  servo  control  switches  on  the  overhead  control  panel.  The  flight  crew  can  then  separate  the 
two  control  wheels  and  maintain  control  without  the  need  for  overriding  the  torque  limiter.  This  action  automatically  closes  up  the  lost 
motion  mechanism.  The  override  bungee  between  the  left  inboard  aileron  servo  output  and  the  right  inboard  aileron  servo  input  acts  as 
secondary  feel  spring  when  the  left  control  channel  is  deactivated. 

The  rudder  power  servo  control  system,  also  powered  by  three  independently  powered  servo  loops,  is  very  similar  to  the  left  inboard 
aileron  servo  system.  However  the  two  electrohydraulic  servos  arc  tied  in  to  the  main  system  in  a way  to  provide  a series  input  for  the 
yaw  stability  augmentation  system  and  are  switched  to  a parallel  mode  (autopilot)  only  during  the  last  part  of  the  approach  for  automatic 
runway  alignment  and  rollout  when  full  rudder  control  may  be  needed. 

AUTOMATIC  FLIGHT  CONTROL 

The  Avionic  Flight  Control  System  (AFCS)  of  the  TriStar  provides  for  manual  and  automatic  control  of  fUght  guidance  functions 
and  related  monitoring  and  warning  functions  throughout  the  total  flight  envelope,  from  takeoff  through  landing  roll-out.  Ttiis  is  accom- 
plished with  an  integrated  set  of  subsystems  which  combine  to  provide  a reliable  system  that  increases  aircraft  safety  by  reducing  crew 
work  loads  and  by  providing  accurate  control  to  desired  flight  tracks.  The  AF(  S hardware,  which  was  developed  by  a Collins  Radio  - 
Lear  Siegler  team,  has  achieved  the  goal  of  providing  the  operational  performance  and  reliability  required  for  a ( ategory  III  automatic 
landing  system. 

The  LrlOl  1 AFCS  consists  of  four  integrated  subsystems.  These  consist  of: 

Stability  Augmentation  System  (SAS) 

Autopilot/Fliglit  Director  System  (APFDS) 

Speed  Control  System  (SCS) 

Flight  Control  Electronic  System  (FCES) 

The  yaw  stability  augmentation  system  provides  yaw  damping  and  turn  coordination  throughout  the  entire  tlight  regime.  In  addition, 
the  yaw  SAS  computers  contain  the  Autoland®  functions  of  runway  alignment  and  rollout.  The  autopilot/flight  director  system  provides 
guidance  (flight  director)  and  automatic  flight  control  modes  for  the  entire  flight  profile,  including  Category  Hla  automatic  landing.  Ttie 
speed  control  system  provides  automatic  throttle  computations  as  well  as  takeoff  and  go-around  capabilities.  The  primary'  (light  control 
electronic  system  comprises  several  subsystems,  including  pitch  trim  (manual,  automatic.  Mach).  Mach  feel  compensation,  altitude  alert, 
stall  warning,  direct  lift  controL  automatic  ground  speed  brakes,  and  primary  flight  control  monitoring. 

Tlie  components  which  comprise  the  AFCS  are  listed  by  subsystem  in  Table  I . For  total  system  function,  these  components  inter- 
face with  other  airplane  elements  such  as  air  data  sensors,  attitude  references,  radio  navigation  and  altimetry  systems,  electrohydraulic 
and  electrical  flight  control  servos  and  (light  instruments.  Figure  'l.A  is  a photograph  of  the  4K  separate  AFCS  items.  The  nine  computers 
are  all  3/4  ATR  long  units  providing  standardiaation  of  many  mechanical  as  well  as  electronic  components.  The  computers  comprise  ti8T 
of  the  261  pound  total  for  all  the  items  shown. 

Functional  element  circuit  implementation  and  modular  packaging  arc  features  of  the  computers.  The  hardware,  designed  dunng 
the  early  phases  of  L-101 1 development,  uses  D.C.  analog  computations  and  digital  logic.  Figure  dB  is  a photograph  of  the  pitch  com- 
puter chassis,  illustrating  the  construction  common  to  all  AFCS  computers.  A multi-layer  printed  circuit  sideboard  provides  the  inter- 
connect for  the  plug-in  printed  circuit  cards.  The  sideboard  wiring  is  physically  separated  for  each  of  the  redundant  channels  to  ensure 
maximum  integrity.  The  separation  is  maintained  within  (he  computer  connectors  and  throughout  the  aircraft  wiring  as  well  as  on  the 
circuit  cards. 

Figure  9C  is  a photograph  of  one  of  the  printed  circuit  cards.  A clear  area  is  provided  between  the  redundant  elements  although  in 
some  cases  monitor  circuits  are  mounted  in  the  center  of  the  card. 

STABILITY  AUGMENTATION  SYSTEM  (SAS) 

System  Mechanization 


Figure  10  depicts  the  cruise  configuration  of  the  SAS.  Each  of  the  two  yaw  computers  contains  two  computations  that  output 
identical  servo  commands  to  an  in-line  monitored  electrohydraulic  servo.  Four  aileron  position  transducers  and  three  rate  gyros  service 
the  four  computations  of  the  total  system.  The  rate  gyros  provided  for  Dutch  roll  damping  inputs  and  the  aileron  transducers  prov  ide 
for  turn  coordination. 

Figure  1 1 shows  a SAS  cruise  computation.  It  is  seen  that  the  gains  arc  scheduled  with  flap  position  and  the  gyro  path  has  the  usual 
low  frequency  washout  filter  plus  a high  frequency  cut-off.  The  aileron  input  path  has  a limited  washout  to  remove  aileron  trim  effects 
and  a scheduled  dead  zone  such  that  turn  coordination  only  conies  through  sufficiently  large  aileron  inputs.  The  passed  signal  is  subject 
to  gain  changing  to  match  the  gyro  path  and  to  low  pass  filtering  The  voter  output  to  rudder  surface  response  can  be  approximated  by 
a two  Hz  second  order  servo  for  small  amplitudes.  However,  the  primary  control  surface  servo  is  severely  hinge  limited  in  cruLsc  flight. 

It  is  noted  (hat  in  Figure  1 1 the  output  of  a computation  comprises  one  input  to  a voter.  As  one  would  expect,  there  are  two  voters 
per  yaw  computer  with  two  voter  outputs  required  to  drive  one  SAS  electrohydraulic  servo  as  depicted  in  Figure  1 2.  The  two  voter  out- 
puts provide  for  driving  the  electrohydraulic  valve  (FHV)  coils  in  a push-pull  arrangement  with  two  sets  of  redundant  monitors  acting  to 
shut  off  the  servo  loop  hydraulics  if  a fault  is  detected.  One  set  (('SI ) monitors  the  unbalance  between  the  two  halves  of  the  servo  drive 
and  the  other  set  (CS2)  monitors  input  command  versus  output  position. 

Figure  1 3 shows  the  voter  input  crossfeeding  for  the  complete  duaFdual  system.  There  are  redundant  monitors  in  front  of  the 
voters  which  control  the  signal  configuration  of  the  voter  inputs  as  shown  in  Figure  14.  This  figure  illustrates  the  concept  whereby  the 
moniton  control  switching  logic  that  substitutes  signal  ground  or  an  alternate  computation  for  a faulted  channel. 
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In  addition  to  servo  and  computation  monitors,  there  are  rate  gyro  monitors  and  electrical  power  monitors.  The  latter  operate  into 
the  servo  engage  logic  wliile  the  former  monitors  operate  into  the  voter  switcliing  logic. 

It  is  seen  from  the  foregoing  that  the  voters  are  important  to  this  dual-dual  arrangement  m performing  two  very  significant  functions. 

• The  voters  equalize  the  four  computation  output  signals  which  are  then  used  by  the  in-line  monitored  servos. 

• The  voters  pass  a good  signal  while  a faulty  computation  is  being  detected  by  the  pre-voter  monitors. 

Both  these  functions  are  essential  to  minimizing  nuisance  monitor  trips  with  acceptable  monitor  detection  levels. 

Design  Objectives  and  Performance 

The  function  of  the  cruise  mode  of  the  SAS  is,  of  course,  to  provide  improved  Dutch  roll  damping  for  enhancement  of  passenger 
comfort  and  handling  qualities  and  for  reduction  of  loads  on  the  vertical  tail.  This  reduction  of  fin  tail  loading,  m continuous  turbulence, 
was  reflected  in  the  definition  of  limit  design  loads. 

Early  in  the  development  of  the  HOI  1,  the  effectiveness  of  the  SAS  was  investigated  to  detennine  iverformance  and  reliability 
objectives  for  the  SAS  from  a loads  viewpoint.  It  appeared  that  a minimum  damping  ratio  of  0.3  and  a timewise  availability  of  97'?  were 
modest  design  objectives  that  would  yield  significant  load  reductions.  It  was  subsequently  found,  however,  that  higher  damping  ratios 
could  be  achieved  over  most  of  the  climb,  cruise  and  descent  flight  regimes  as  seen  from  the  data  given  in  Table  2.  Only  at  low  speeds, 
where  effects  on  fin  loads  are  not  critical,  are  the  damping  ratios  less  than  0.3. 

It  also  became  evident  that  a 977r  availability  requirement  was  a very  conservative  estimate  of  sy  stem  reliability.  Tlie  single  channel 
failure  rate  was  calculated  to  be  about  10'^  per  hour  and  to  preclude  the  possibility  that  an  airplane  might  be  llown  without  SAS  for  a 
protracted  period,  it  is  required  that  at  least  one  of  the  two  channels  be  operative  for  dispatch.  Recognizing  that  for  most  flights  both 
channels  of  SAS  are  operative,  even  99.991  timewise  availability  would  appear  to  be  conservative. 

A complete  discussion  of  the  effect  of  SAS  availability  on  loads  is  given  in  Reference  ( 1 ) from  which  Figure  1 5 is  taken.  This  figure 
illustrates  the  definition  of  design  loading  for  vertical  tail  shear  with  0.  97  and  1 0097  SAS  availabilities.  It  is  ba^ed  on  a mission  analysis 
criterion  whereby  the  frequency  of  exceedance  of  a load  quantity  is  calculated  for  operations  over  specified  design  fliglit  profiles.  The 
turbulence  environment  as  statistically  described  for  each  segment  of  a profile  is  applied  to  the  airplane/load  transfer  function  to  derive 
exceedance  curves  (with  or  without  SAS  operating)  for  each  segment.  The  segment  exceedances  are  summed  over  the  total  of  all  profiles 
to  determine  a load  vs.  frequency-of-exceedance  curve  for  the  mission. 

It  can  be  seen  from  Figure  1 5 that  the  major  reduction  (H/F  = 0.70)  is  realized  by  having  at  least  9791  availability  and  further  reduc- 
tion comes  less  readily  with  10091  availability  realizing  a ratio  of  G/F  = 0.65.  These  results  are  for  a fully  linear  system  and  saturation 
effects  reduce  the  benefits  somewhat  In  summary,  however,  with  9791  availability  the  net  reduction  in  fin  loading  is  better  than  359f 
compared  to  what  it  would  be  if  no  SAS  were  available.  In  actual  practice  the  SAS  availability  has  proven  to  be  much  better  than 
99,997c  leading  to  the  conclusion  that  for  all  practical  purposes  the  load  alleviation  is  in  accordance  with  the  10091  expectation. 

AUTOPILOT/FUGHT  DIRECTOR  SYSTEM  (APFDS) 

The  APFDS  consists  essentially  of  two  integrated  autopilot/flight  director  channels  whose  operation  is  controlled  from  a single 
set  of  control  panels  mounted  in  the  glareshield.  The  hardware  includes  two  identical  pitch  and  roll  computers  each  of  which  contains  a 
single  "cruise”  computation  and  dual  automatic  landing  computations.  The  compulations  are  common  for  both  autopilot  and  flight 
director  control;  however,  some  separation  is  retained  where  required  to  provide  performance  unique  to  the  autopilot  or  flight  director 
function. 

The  APFDS  provides  the  usual  pitch  and  roll  modes: 

Roll  and  Pitch  Attitude  Hold  with  Control  Wheel  Steering  (CWS)  and  Turbulence  Configuration  Control 

Altitude  Select  and  Hold 

Vertical  Speed  Select  and  Hold 

Airspeed  Hold 

Mach  Hold 

Heading  Select  and  Hold 

VOR  and  Area  Navigation  (Lateral  and  Vertical) 

Localizer  Capture  and  Track 

In  addition,  there  are  the  common  axis  modes  of: 

Approach 

Approach/land  (Automatic  Landing) 

Go- Around 
Takeoff 

The  pitch  commands  for  Ck>  Around  and  Takeoff  are  derived  in  the  Speed  Control  Computer  (SCS)  with  Takeoff  being  a flight  director 
mode  only.  The  Approach/Land  mode  is  used  for  automatic  landing  and  is  the  only  fail-operative  autopilot  mode.  The  remaining  modes 
are  the  "cruise”  modes,  their  mechanization  is  loosely  referred  to  as  the  cruise  autopilot. 
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CRUISE  AUTOPILOT 
System  Mechanization 

Figure  16  illustrates  the  redundancy  and  monitoring  characteristics  of  the  APFDS  cruise  configuration.  Servo  monitoring  and  pre- 
voter monitoring  and  switching  are  not  shown  in  Figure  1 6,  these  features  are,  however,  similar  to  those  shown  in  Figures  1 2 and  14. 

During  cruise  operation,  only  one  autopilot  servo  can  be  engaged;  however,  both  APFDS  channels  are  available  for  independent 
flight  director  control  As  shown  in  Figure  1 6,  if  channel  “A”  autopilot  is  engaged,  the  “A*’  cruise  computation  is  processed  to  a com- 
pletely redundant  inner  loop  comprised  of  quadruplex  command  limits  and  voter/monitor  functions.  The  pitch  axis  limit  is  on  attitude 
rate  command  while  the  roll  limit  is  on  aileron  command.  The  voter  outputs  provide  redundant  signals  to  the  in-line  monitored  servo 
loop.  Therefore,  failures  of  the  single  cruise  computation  are  command  limited  for  fail-safe  performance.  Fail  passive  performance  is 
provided  for  failures  of  the  redundant  inner  loop  or  servo  system.  Over  and  above  the  protection  provided  by  the  command  limiters 
there  are  back-up  fail-safe  provisions  to  assure  that  hazardous  failures  are  indeed  extremely  improbable.  The  back-up  autopilot  safety 
feature  for  the  pitch  axis  consists  of  an  autopilot  servo  authority  limiL  The  servo  output  coupler  can  apply  only  a limited  force  to  deflect 
the  controls,  the  force  being  resisted  by  the  feel  spring.  Under  the  most  adverse  conditions  (high  Mach,  high  q)  the  transient  responses  to 
any  AFCS  failures  are  thereby  limited  to  less  than  1 g (delta).  The  back-up  provision  for  the  roll  axis  consists  of  automatic  autopilot  dis- 
coimect  in  the  event  that  the  total  aileron  angle  exceeds  28  degrees  (70^r).  The  detectors  in  this  circuitry  are  dual  switches  (for  each  auto- 
pilot) either  of  which  can  effect  a disconnect  through  independent  circuits.  It  is  necessary  to  override  these  limiters,  however,  during 
automatic  landing,  for  example,  and  in  CWS  with  a pilot  wheel  input  applied.  These  pitch  and  roll  back-up  provisions  and  the  inherent 
rate  limiting  of  the  control  system  servos  adequately  limit  the  airplane  response  to  hardovers,  slowovers  and  oscillatory  failures. 

The  basic  mode  of  the  autopilot  is  attitude/heading  hold  with  control  wheel  steering  (CTVS).  In  order  to  alter  his  reference,  the  pilot 
need  only  apply  control  wheel  force  in  the  desired  axis  and  maneuver  the  aircraft  to  the  new  reference.  No  manual  mode  select  or  knob 
twist  is  necessary.  It  is  noted  that  while  pilot  force  is  applied,  the  command  limits  are  increased  so  as  not  to  restrict  the  pilot’s  maneuvering 
capability.  Redundant  fail-safe  defector  circuits  provide  for  adjustment  of  the  limits. 

The  engage  features  of  the  APFDS  provide  increased  flexibility  by  allowing  a coupled  command  mode  in  one  axis  and  CWS  in  the 
other.  This  expands  the  operational  capabilities  and  eases  pilot  work  loads  in  maneuvers  such  as  holding  patterns,  where,  for  example, 
altitude  hold  could  be  used  in  pitch  with  CWS  retained  in  roll.  Since  the  system  provides  a separate  CWS  engage  position,  flight  director 
command  modes  may  be  displayed  while  using  the  CWS  features.  A softened  CWS  mode  is  provided  by  the  turbulence  mode.  This  enables 
rough  weather  penetration  with  attitude  hold  features  at  reduced  control  activity  levels. 

The  altitude  capture  system  has  been  integrated  with  the  altitude  alert  system.  1 his  allows  a pilot  to  set  a clearance  altitude,  auto- 
matically capture  and  hold  that  altitude,  and  arm  the  altitude  alert  function  with  a single  pilot  action. 

The  localizer  and  approach  modes  provide  Category  I capability  for  use  on  beams  which  are  not  suitable  for  automatic  landing  and 
for  non-precision  approaches.  The  approach/land  mode  provides  ( ategoiy  II  approach  capability  and  the  fully  automatic  landing  and 
rollout  functions.  The  takeoff  mode  provides  flight  director  guidance  for  programmed  climbout  capability  wliile  the  go-around  mode 
provides  both  automatic  and  flight  director  missed  approach  capability. 

Figure  17  shows  the  AFCS  controls  and  displays.  A single  surface  position  indicator  is  provided  to  indicate  the  position  of  trim  and 
all  of  the  primary  control  surfaces.  A single  set  of  autopilot/flighi  director  and  speed  control  panels  are  located  on  the  glareshield.  These 
consist  of  five  panel  modules,  including  Thrust  (auto-throttle).  Heading  Pitch  Mode.  APFDS  Engage.  Navigation  Mode,  and  Altitude 
Select  panels.  In  addition,  two  AFCS  mode  indicators  and  two  AFCS  warning  indicators  are  provided,  one  on  each  pilot's  panel. 

Each  autopilot  may  be  engaged  via  the  appropriate  solenoid  held  bathandle  in  either  the  (TVS  or  command  configuration.  Flight  direct 
tor  indications  are  controlled  by  independent  selector  switches  also  located  on  the  APFDS  engage  panel.  The  autopilot  may  be  discon- 
nected at  any  time  by  means  of  disconnect  switches  located  on  each  control  wheel  or  by  use  of  the  bathandles.  The  autothrottic  system 
is  engaged  by  its  own  solenoid  held  bathandle  and  may  be  disengaged  via  the  bathandles  or  by  disconnect  switches  on  the  throttle  levers. 

Since  the  AFCS  consists  of  two  integrated  autopilot/flight  director  systems,  each  flight  director  necessarily  displays  the  same  mode 
as  its  respective  autopilot.  Combined  with  the  requirement  that  the  two  autopilot/flight  director  systems  remain  mode  synchronized,  an 
effective  pilot  interface  is  achieved  by  the  use  of  a common  mode  controller  for  both  APFDS  systems.  Redundant  contacts  on  each  mode 
select  button  maintain  the  integrity  and  operational  availability  advantages  of  a dual  autopilot  installation.  Mode  selection  is  annunciated 
by  an  illuminated  mode  select  button  and  is  confirmed  by  the  two  APFDS  mode  annunciators. 

Design  Objectives  and  Performance 

The  safety  objective  for  the  cruise  autopilot  was  to  achieve  AFCS  fail-safe  performance  such  that  the  backup  failure  protective  means 
would  be  needed  less  than  once  per  10^  hours.  This  would  assure  that  potentially  hazardous  events  would  be  extremely  improbable. 

This  10"^  per  hour  system  failure  rate  was  readily  achieved  because  of  the  basic  redundancies  which  were  obviously  needed  to  meet  the 
Category  111  requirements. 

The  significant  failures  are  downstream  of  the  command  limiters  and  voters  in  the  servo  area  and  in  the  disconnect  circuits.  Fig- 
ure 1 8 shows  the  disconnect  circuit  redundancy  for  one  autopilot.  It  is  seen  that  there  arc  two  valid/logic  paths  either  of  which  can  cause 
a disconnect.  Path  A holds  the  bathandle  solenoids  engaged  and  path  B engages  the  servos.  Tliere  are  four  ways  to  deenergize  the  si-rvo 
solenoids  - removing  servo  solenoid  grounds,  removing  bathandle  solenoid  grounds,  actuating  control  wheel  disconnect  switches  (captain's 
and  first  officer’s)  or  pushing  the  bathandle  to  the  “off  position. 

During  certification  testing,  demonstrations  of  the  three  basic  failure  protection  methods  were  performed.  Hardovers  were  applied 
upstream  of  the  command  limiters  to  show  the  responses  at  the  limit  values.  Faults  were  applied  in  the  servo  circuits  to  demonstrate  their 
faiFpassivc  nature.  Hardovers  and  slowovers  were  applied  with  monitors  by-passed  to  verify  the  fail-safe  characteristics  of  the  back-up 
protective  means.  Oscillatory  failures  at  frequencies  above  which  the  various  failure  protection  methods  would  not  work  were  cleared  by 
showing  that  the  maximum  rate  capabilities  of  the  various  servos  could  not  provide  significant  amplitudes. 

AirrOMATK  LANDING  SYSTEM  (ALS) 

System  Mechanization 


Fhe  principal  elements  of  the  ALS  are  the  APFDS  and  SAS  and  their  respective  sensors  in  the  configurations  established  below 
1 500  feet  (radio  altitude)  with  the  Approach/Land  ( A/L)  mode  selected.  As  seen  from  Table  3,  the  system  in  total  definition  includes 
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much  more  than  these  units  but  these  have  the  most  effect  on  system  reUability  and  availability.  ReUability  is  used  here  in  the  sense  of 
the  system  capability  to  complete  a landing.  It  relates  directly  to  safety,  particularly  in  low  weather  minima  operations.  It  was,  of  course, 
the  Category  111  requirement  that  dictated  the  extent  of  ALS  redundance  which  is  depicted  in  some  generality  in  Figure  19  for  the  pitch 
and  roll  control  axes.  Each  of  these  axes  uses  three  accelerometers  (normal  or  lateral)  and  three  attitude  inputs.  Pitch  computations  use 
only  derived  pitch  rate;  roll  uses  both  attitude  and  derived  rate  signals.  The  primary  Autoland  ® Sensor  signals  are  glideslope  (G/S)  error 
and  radio  altitude  for  pitch  and  localizer  (LOC)  error  for  rolL  Only  two  each  of  the  Autoland  ® Sensors  are  used  but  each  has  dual  out- 
puts with  high  integrity  self-monitoring.  For  example,  the  probability  of  the  two  signals  from  one  G/S  receiver  being  faulted  at  a critical 
time  without  warning  signals  to  the  pitch  computer  is  less  than  lO"*^. 

The  same  theme  of  APFDS  redundancy  is  carried  over  into  the  SAS  in  the  A/L  mode  as  seen  in  Figure  20.  Here,  the  exception  is 
that  only  two  compass  systems  are  utilized  which  do  not  have  the  integrity  of  an  Autoland  Sensor.  The  redundancy  requirement,  how- 
ever, is  not  as  great  for  yaw  control  as  it  is  for  pitch  and  roll.  Automatic  landings  with  no  automatic  yaw  control  have  been  demonstrated 
without  any  significant  effect  except  that  the  pilot  had  to  control  the  alignment  and  rollout.  The  compass  inputs  are  actually  compared 
and  averaged  in  the  SAS  computers  and  used  to  define  a reference  heading  error  which  is  memorized.  The  compass  signals  are  switched 
out  at  1 50  feet  and  integrated  rate  gyro  data  is  used  from  there  to  touchdown.  (The  radio  altitude  signals  used  to  control  the  transition 
are  omitted  from  Figure  20.)  Below  150  feet,  a maneuver  is  performed  whereby  the  aircraft  fuselage  is  aligned  with  the  runway  and  a 
limited  wing  down  is  held  against  crosswind.  Up  to  eight  degrees  of  crab  can  be  removed  in  this  manner.  The  basic  inputs  for  this  control 
are  heading  error  and  localizer  error  and  the  latter  continues  to  control  the  airplane  in  rollout  through  rudder  and  nose  wheel  steering. 

This  rollout  mode  which  is  effective  to  low  ground  speed  is  a safeguard  against  patchy  or  swirling  patterns  that  might  temporarily  cause 
loss  of  visual  reference.  It  is  also  necessary  for  extending  the  system  capability  to  Category  lllb. 

As  would  be  expected,  the  faiFoperative  pitch,  roll  and  yaw  (below  150  feet)  mechanizations  closely  follow  that  as  depicted  in  Fig- 
ures 12,  13,  and  14  for  the  cruise  yaw  control  Four  computation  channels  for  each  axis  are  needed  for  the  faiFoperative  condition  and 
two  or  three  for  the  faiFpassive  condition.  The  latter  configuration  is  acceptable  for  Category  II  operations  while  the  former  is  required 
down  to  the  alert  height  for  Category  Ilia.  There  are  minor  differences  in  each  servo  control  and  monitoring  mechanizations,  but  the 
basic  concepts  of  Figure  l2  are  applied.  For  Category  HI,  of  course,  it  requires  two  servos  per  axis  while  one  is  acceptable  for  Category  II. 

Although  not  considered  as  necessary  to  automatic  landing  as  pitch,  roll,  and  yaw  control,  the  functions  provided  by  the  SCS  are 
important  to  Category  III  as  automatic  throttle  is  required  as  is  automatic  go-around  in  some  cases. 

There  are  two  throttle  control  modes  — airspeed  hold  with  pilot  selectable  reference  and  stall  margin  angle-of-attack  hold  with  the 
latter  mode  acting  as  a floor  on  the  former.  Airspeed  hold  is  used  normally  until  landing  flap  is  selected  at  which  time  the  stall  margin 
mode  is  forced  if  the  autothrottle  system  (ATS)  is  engaged.  The  angle-of-attack  reference  corresponds  to  a speed  of  from  3 to  10  knots 
above  reference  speed  (depending  on  airplane  center  of  gravity)  and  is  not  adjustable  except  automatically  when  spoilers  are  operated 
(for  DLC)  and  when  down  and/or  tail  gusts  are  detected.  The  "gust  sniffer”  circuits  can  add  the  equivalent  of  about  4 and  8 knots  to  the 
reference  speed  for  20  seconds  (or  longer  if  turbulence  levels  are  continuously  detected).  When  the  airplane  altitude  reduces  to  50  feet 
(radio),  a controlled  deceleration  is  programmed  until  touchdown  when  the  throttles  are  automatically  brought  to  ground  idle  and  the 
system  automatically  disconnected. 

For  the  stall  margin  mode  the  computation  output  commands  a throttle  rate.  With  much  the  same  computation  scheme  the  SCS 
commands  a pitch  rate  when  the  go-around  mode  is  engaged.  In  this  case  the  basic  reference  is  an  angle-of-attack  corresponding  to  around 
1.25  stall  speed. 

It  is  noted  that  a more  complete  description  of  the  system,  that  includes  the  basic  control  laws,  is  given  in  Reference  2.  The  control 
laws  determine  the  performance  in  the  presence  of  disturbances  which  is  fundamental  to  safety  but  rather  than  present  them  herein,  it  is 
elected  to  introduce  three  of  their  features  which  are  a part  of  the  L-IOI I ALS  to  improve  performance  and  safety. 

It  was  pointed  out  in  the  discussion  on  Primary  Flight  Control  that  one  of  the  subsystems  of  the  AFCS  is  Direct  Lift  Control  (DLC) 
which  was  included  in  the  basic  L-IOI  I design  for  the  purpose  of  enhancing  the  performance  and  safety  of  the  aircraft  during  approach 
and  landing  This  subsystem  is  commonly  associated  with  automatic  landing  and  so  is  described  in  this  section.  It  is  utilized,  however, 
whenever  landing  flaps  are  selected  without  or  with  the  APFDS  engaged  in  any  mode. 

The  basic  DLC  mechanization  scheme  is  shown  in  Figure  21.  Stabilizer  motion  relative  to  the  pitch  trim  position  is  measured  electri- 
cally and  after  shaping  is  used  to  position  the  DLC  servo  which  drives  the  linkages  to  the  eight  DLC  spoilers.  TTie  two  stabilizer  motion 
transducers  are  each  dual;  they  are,  in  fact,  the  same  ones  which  are  used  for  the  faiFoperative  autotrim  system.  The  DLC  servo  is  also 
dual  (each  in-line  monitored)  and  the  computations  are  duaFdual  although  voters  are  not  required  as  the  computations  are  simple  and 
track  adequately.  The  system  response  to  pitch  control  is  invariant  whether  pilot  or  autopilot  moves  the  controls. 

Two  means  are  employed  to  minimize  the  effects  of  ILS  signal  anomalies. 

1 

• Radio  inputs  to  the  computations  are  held  at  fixed  levels  for  short  periods  ( 1 second  for  G/S  and  5 for  LOC)  before  a disconnect  i 

is  effected  because  of  an  ILS  radio  flag. 

• Large  LOC  deviations  (without  a flag)  result  in  clamping  the  radio  inputs  for  up  to  8 seconds  before  normal  tracking  is  resumed. 

The  level  of  “large”  deviation  which  causes  the  cla  nping  is  scheduled  with  radio  altitude,  closing  to  120  microamps  near  the 

threshold.  ; 

The  need  to  cope  with  ILS  anomalies  should  be  minimal  in  Category  III  conditions  but  these  do,  however,  seem  to  be  prevalent,  particularly 
beam  “overflight”  disturbances,  in  the  less  severe  weather  conditions. 

Another  safety-oriented  feature  of  the  ALS  is  the  manner  in  which  automatic  pitch  trim  is  used.  At  the  time  the  fully  redundant  I 

ALS  configuration  is  established  (1500  feet),  a trim  bias  is  inserted  in  the  fly-up  direction.  The  autopilot  has  to  control  (nose  down)  against  ; 

this  bias  so  that  a subsequent  disconnect  will  result  in  a definite  pilch-up  maneuver.  This  also  has  more  significance  in  Category  I or  II 
situation  than  in  Category  III  conditions. 

Design  Objectives  and  Performance 

The  HOI  I program  objective  with  respect  to  the  ALS  was  to  achieve  a timely  Category  Ilia  certification  with  a system  having  the 
potential  for  Category  lllb.  The  design  objectives  were  to  produce  an  automatic  landing  system  with  superior  performance,  reliability  and 
availability  so  (hat  the  full  safety  potential  of  automatic  landing  could  he  realized. 

^ 


The  HOI  1 has  been  in  service  long  enough  to  have  proven  that  the  system  performs  in  accordance  with  expectation.  The  L-IOl  1 
ALS  average  performance  in  the  real  world  environment  is  as  follows: 

• Longitudinal  touchdown,  95%  footprint  - 800  feet 

• Touchdown  sink  rate 

average,  1.9  feet/second 
95%  upper  limit,  3.0  feet/second 

• Lateral  touchdown,  95%  footprint  - 42  feet 

This  data  applies  to  the  maximum  landing  flap  case  (42  degreesX  For  the  33  degree  landing  flap  case,  the  data  is  the  same  except  that  the 
longitudinal  touchdown  footprint  is  increased  less  than  1 5%. 

The  above  performance  data  applies  to  the  Category  II/IJI  system  which  requires  DLC.  The  system  can  be  used  without  DLC  in  other 
than  Category  11/111  conditions;  however,  the  longitudinal  dispersion  is  somewhat  greater.  Although  this  could  be  modified  by  increasing 
the  sink  rate  at  touchdown,  the  DLC  reliability  is  so  high  that  there  is  no  benefit  to  having  two  different  control  laws  (with  and  without 
DLC)  one  of  which  would  give  significantly  higher  touchdown  sink  rates.  The  advantage  of  DLC  is  illustrated  in  Figure  22.  This  figure 
shows  flight  test  data  of  landings  with  and  v/ithout  DLC  made  under  comparable  conditions.  The  systems  were  similar  except  in  two 
respects,  the  use  of  DLC  as  mentioned  and  the  adjustment  for  nominal  sink  rate  at  touchdown.  It  is  seen  in  Figure  22  that  the  system 
without  DLC  has  the  large  footprint  with  significantly  higher  sink  rates.  As  the  improved  controllability  of  DLC  results  from  the  faster 
acceleration  response  time,  similar  benefits  are  realized  in  manual  flying  though  these  have  not  been  so  easily  quantified. 

The  effectiveness  of  the  computation  circuits  in  suppressing  LCX"  "overflight”  disturbances  is  shown  in  Figure  23.  The  figure  gives 
results  of  simulator  landings,  several  hundred  per  disturbance  duration,  made  with  1 5 knot  crosswind  and  2.5  microamp  rms  radio  noise 
plus  LOC  hardovers  superimposed  at  low  altitude  (below  50  feetX  An  “unprotected”  hardover  of  this  magnitude  even  for  a few  seconds 
and  with  no  wind  would  have  the  airplane  off  the  runway  unless  the  pilot  took  corrective  action  promptly. 

Of  course,  with  protective  circuits  of  this  nature  hardover  magnitudes  around  the  detection  level  give  the  greatest  airplane  response 
but  the  probabilities  of  landing  off  the  runway  are  acceptably  remote  without  pilot  intervention.  In  passing  it  can  be  said  that  the  simula- 
tor also  showed  a small  but  measurable  improvement  in  landing  footprint  as  a result  of  the  noise  filtering  characteristic  of  the  detection/ 
hold  circuit 

In  addition  to  meeting  the  landing  footprint  performance  requirements  it  is  necessary  to  assure  adequate  threshold  clearances.  This 
follows  readily,  as  long  as  the  ALS  is  functioning  and  the  footprint  itself  is  within  acceptable  limits  as  it  is  for  the  L-IOl  1.  In  the  event  of 
a low  altitude  disconnect  prior  to  the  threshold,  however,  it  is  not  obvious  that  the  clearance  will  be  adequate  unless  some  nose-up  input 
is  automatically  applied  at  disconnect.  That  such  an  input  is  sufficient  can  be  seen  by  comparing  (on  a statistical  basis)  the  sink  rates  at  a 
wheel  height  of  50  feet  (radio)  plus  At  seconds  for  the  normally  functioning  ALS  and  for  a system  having  a total  disconnect  at  50  feet. 

If  the  sink  rates  in  the  latter  case  are  less  than  for  the  normally  functioning  system,  the  expected  ground  clearances  following  disconnect 
would  be  greater  than  for  the  normally  functioning  system.  This  was  evaluated  by  simulation  for  two  values  of  At  with  limiting  disturbances 
(26  knots  wind,  2.5  microamp  rms  radio  noiseX  The  probabilities  of  the  disconnected  system  sink  rates  being  greater  than  the  normal 
system  rates  were  5 x 10"^  for  At  = 1 second  and  3 x ICT^  for  At  = 2 seconds.  These  probabilities  become  rapidly  smaller  as  the  distun 
bance  levels  decrease  so  that  it  can  be  appreciated  that  considering  the  probabilities  of  disconnect  and  of  encountering  limiting  disturbance 
conditions  the  trim  bias  is  sufficient  to  assure  adequate  clearance  heights  following  a disconnect  even  without  pilot  intervention.  Consider- 
ing that  the  pilot  attempts  to  land  following  the  disconnect,  the  trim  bias  may  tend  to  cause  a longer  landing  distance  than  otherwise  but 
this  appears  to  be  better  than  the  alternative. 

As  service  experience  is  proving  the  acceptability  of  the  ALS  fault-free  performance,  so  is  it  proving  out  its  reliability.  Certification 
to  Category  Ilia  by  the  manufacturer  is  only  an  initial  step  toward  achieving  certification  for  use  in  revenue  operations.  Fach  operator 
must  verify  his  capability  to  use  the  system.  Three  L-IOl  1 operators  have  done  so  and  at  least  one  other  is  proceding  toward  that  objective. 

One  of  the  things  an  operator  must  show  to  achieve  ALS  certification  is  that  the  system  reliability  in  the  airline  environment  is 
compatible  with  the  failure  rates  used  in  the  Lockheed  certification  analysis.  One  indication  of  this  is  a comparison  of  failure  rates 
achieved  with  those  used  in  the  Lockheed  certification  analysis.  In  effect,  MTBF  tracking  limits  are  defined.  Table  4 shows  a list  of 
MTBF  lower  limits  and  their  currently  estimated  values.  The  data  given  in  this  table  are  for  the  significant  contributors  to  the  total  dis- 
connect probability  (below  the  alert  heightX  If  the  MTBF’s  of  all  the  listed  units  were  at  the  lower  limits,  the  total  disconnect  probability 
would  be  potentially  a factor  of  two  higher,  still  within  acceptable  limits.  These  “lower  limits”  are  not  absolute  limits  in  view  of  the  fact 
that  the  two  factor  docs  not  put  the  disconnect  probability  to  an  unacceptable  level  and  further.one  low  MTBF  value  could  be  compen- 
sated by  a high  one.  To  a certain  extent  the  limit  is  a tracking  limit  to  signal  for  more  detailed  examinaiion  ol  « potential  trouble  area. 

In  the  main,  however,  the  airline  results  have  supported  reliability  predictions.  They  are  also  indicative  that  with  emerging  maturity 
the  Category  III  system  availability  will  achieve  the  L-IOI 1 objective  of  better  than  95  percent.  A certain  amount  of  service  experience 
feedback  must  be  achieved  before  this  level  is  attained,  but  it  appears  at  this  time  to  he  entirely  feasible.  That  is.  given  that  it  is  desirable 
to  perform  a Category  III  landing,  the  landing  can  be  initiated  and  completed  on  95  percent  of  these  occasions.  The  Category  II  ALS 
availability  is,  of  course,  of  an  order  better  than  this. 
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TABLE  I.  L-IOI I AVIONIC  FLIGHT  CONTROL  SYSTEM  EQUIPMENT  LIST 


Slability  AugmentaHon  Sysletn(SAS) 

2 Yaw  Computer!. 

3 Rate  Gyros 

2 Aileron  Position  Sensors  (dual) 

2 Rudder  Position  Sensors  (dual) 

Autopilot/Flight  Director  System  (APFPS) 

2 Pitch  Computers 

2 Roll  Computers 

2 Pilot's  Control  Wheels 

2 Mode  Annunciators 

2 Warning  Indicators 

5 Mode  Select  Panel  Modules 

1 Normal  Accelerometer  Unit  (triple) 

1 Lateral  Accelerometer  Unit  (triple) 

Speed  Control  System  (SCS) 

1 Speed  Control  Computer 

1 Autothrottle  Servo 

1 Longitudinal  Accelerometer  Unit  (dual) 

Flight  Control  Electronic  System  (FCES) 

I FCES  Computer 

1 Trim  Augmentation  Computer 

2 Angle  of  Attack  Sensors 

2 Stick  Shaken 

1 Surface  Position  and  Pitch  Trim  Indicator 

10  Surface  Position  Sensors 

3 Control  Panels 


TABLE  2.  L-IOl  1 DUTCH  ROLL  CHARACTERISTICS  WITH  AND  WITHOUT  YAW  SAS 


FLIGHT  CONDITIONS  (MID  CG) 

DUTCH  ROLL  MODE 
DAMPING  RATIO  AND  1 
DAMPED  NATURAL  FREQ  j 

SAS 

ON 

SAS 

OFF  j 

SPEED 

MACH 

ALTITUDE 

WEIGHT 

FLAPS 

fd 

fd  1 

CONFIGURATION 

KEAS 

NO. 

KFT 

KLBS 

DEG 

GEAR 

Hz 

Hz  1 

Climb 

246 

.45 

10 

404 

UP 

LP 

.32 

.13 

O 

.15  .' 

Climb 

356 

.65 

10 

308.5 

UP 

UP 

.12 

14 

.12 

.22  i 

Climb 

358 

20 

400 

UP 

UP 

.56 

.15 

.07 

.21 

Cruise 

310 

.86 

33 

350 

UP 

UP 

,45 

.18 

.10 

.20 

Cruise 

260 

.86 

37.5 

300 

UP 

LP 

.43 

.15 

.07 

.18  ! 

Cruise  (M^q) 

352 

.90 

26.5 

300 

UP 

UP 

.55 

.21 

.11 

.24  ! 

Dive  (Mjj) 

412 

.95 

21.5 

350 

UP 

IP 

.53 

.25 

.13 

.28 

Dive  (M[j) 

258 

.95 

42 

300 

UP 

IP 

.41 

,17 

.11 

.18 

Cruise  ( 1.4  Vg) 

221 

.74 

38 

300 

UP 

UP 

*> 

.14 

.05 

.15 

Cruise 

216 

.435 

15 

308.5 

IP 

UP 

.33 

.13 

.11 

.15  . 

Descent 

246 

.45 

10 

308.5 

IP 

LP 

49 

.13 

.12 

.P 

fioiding 

256 

.4 

1.5 

308,5 

IP 

IP 

.50 

.13 

.13 

.16 

Molding 

160 

10 

308.5 

DOWN 

IP 

10 

08 

.13 

Approach  ( 1.3  V'lj) 

139 

.21 

0 

.308.5 

IX)WN 

TOWN 

.26 

.10 

,09 

.12 

Landing  ( 1.3 

133 

■> 

0 

308.5 

IX3WN 

DOWN 

,24 

.09 

09 

,12 

Landing  ( 1 .3  V<;) 

141 

.213 

0 

348 

DOWN 

DOWN 

-21 

09 

.06 

12 

: Landing  (DLC  ON) 

133 

.2 

0 

.308.5 

IX)WN 

TOWN 

.26 

,09 

.10 

12 

1 l.anding  ( 1 .4 

) 

143 

.262 

10 

308  5 

DOWN 

TOWN 

.21 

.09 

.05 
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TABLE  3.  AUTOMATIC  LANDING  SYSTEM  MAJOR  ELEMENTS 


No. 

Item 

Reg. 

Remarks 

Pitch  Computer 

2 

1 

Roll  Computer 

2 

Each  computer  is  dual  channel. 

Yaw  Computer 

2 

Roll  A/P  Servo 

2 

Pitch  A/P  Servo 

Each  servo  is  in-line  monitored. 

Yaw  A/P  Servo 

2 

Aileron  Position  Sensor 

Each  sensor  is  dual 

Rudder  Position  Sensor 

2 

Each  sensor  is  dual. 

Yaw  Rate  Gyro 

3 

Each  has  limited  in-line  monitoring. 

Mode  Annunciator 

2 

Warning  Indicator 

2 

Mode  Select  Panel 

1 

5 Modules 

Normal  Accelerometer 

3 

Lateral  Accelerometer 

3 

Attitude  Gyro 

3 

Each  has  limited  in-line  monitoring 

Radio  Altimeter 

2 

Each  has  dual  outputs  with  high 

ILS  Receiver 

2 

integrity  monitoring. 

Speed  Control  Computer 

1 

Computer  is  dual  channel 

Autothrottle  Servo 

1 

Servo  is  in-line  monitored 

Longitudinal  Accelerometer 

2 

FCES  Computer 

1 

Provide  for  fail-op;  fail-pass  DLC 

DLC  Servo 

2 

Each  is  in-line  monitored 

Trim  Augmentation  Computer 

1 

Provides  for  fail-op/fail-pass  auto  pilch  trim. 

Angle  of  Attack  Sensor 

2 

Each  has  limited  in-line  monitoring. 

Air  Data  Computer 

2 

Each  has  limited  in-line  monitoring. 

Altimeter 

lAS/M  Indicator 

*> 

VSl 

2 

ADI 

2 

HSI 

2 

Radio  Altitude  Indicator 

2 

Compass  System 

2 

Hydraulic  Source 

2 

Electric  Source 

3 

j 

TABLE  4.  ESTIMATED  MTBF’S  VS  MTBF  LOWER  LIMITS 


Item 

MTBF  Lower 
Limit 

Latest  MTBF 
Point  Est. 

Mature  MTBF 

Roll  Servo 

4.000 

14.000 

20.000 

Pitch  Servo 

10,000 

40.000 

49,000 

Roll  Computer 

1,500 

1.800 

3.350 

Pitch  Computer 

1.500 

2.900 

2.700 

j Yaw  Computer 

2.000 

3,100 

4.600 

I ILS  Receiver 

2.000 

3.300 

3 500 

1 Radio  Altimeter 

2.000 

2.700 

3,500 

i Vertical  Gyro 

1,800 

2,900 

3.720 

j Lateral  Accelerometer 

32,000 

* • 

63.500 

1 Normal  Accelerometer 

C 

o 

o 

• • 

63,500 

j Warning  Indicator 

10,000 

I4.POO 

48,000 

•*No  reported  failures  in  400,000  accelerometer  flight  hours 


FIGURE  1.  L-lOll  GENERAL  ARRANGEMENT 


FIGURE  2.  CONTROL  SURFACE  ARRANGEMENI 


FIGURE  3.  PRIMARY  PITCH  CONTROL  SYSTEM 
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FIGURE  6,  STABILIZER  AUTOPILOT  SERVO  SYSTEM  SCHEMATIC 


FIGURE  7 ROLL/DLC  CONTROL  SYSTEM  SCHEMATIC 
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FIGURE  1 1 . YAW  SAS  CRUISE  MODE  COMPUTATION 
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FIGURE  20.  SAS  APPROACH  LAND  MODE  CONFIGURATION 

(i)  THE  INDIVIDUAL  SPOILER  POWER  SERVOS  AND  DRIVE  LIKAGES  ARE  NOT  ILLUSTRATED 

@ POSITION  IS  ACTUALLY  MEASURED  RELATIVE  TO  TRIM  AS  DEFINED  BY  THE  PITCH  TRIM 
ACTUATOR  POSITION 
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LES  COroi^ES  DE  DE  CONCORDE 
par 

MM.  M.  BOSSARD  et  R.  DEQUE 
Ing^nieurs  au  Bureau  D'Etudes  de  la  Division  Avions 
AEROSPATIALE 

316,  route  de  BAYONNE,  B.P.  3153 
TOULOUSE  (FRANCE) 


RESUME 

L’avion  de  transport  aupersonique  CONCORDE,  actuellement  en  service,  est  pilots  normalement  par 
1' interoediaire  de  commandes  de  vol  electriques  redondantea.  Une  commande  mecanique  de  aecoura  est  dispo- 
nible  pour  assurer  le  pilotage  de  1' avion  dans  les  cas  de  perte  des  deux  comiaandes  4lectriques. 

Apres  une  description  du  syateme,  sea  performances  principales  sont  presentees. 

Ces  commandea  de  vol  pr^sentent  un  degre  de  securite  tres  eiev^  ; noua  exposons  lea  methodes  misea  en 
oeuvre  pour  I'obtenir. 

L* AEROSPATIALE  etudie  en  Laboratoire,  et  prochainement  en  vol,  des  systemes  de  commandes  de  vol  enti^re- 
ment  electriques  bases  sur  1* utilisation  de  techniques  digitales  qui  pourraient  trouver  leur  application 
sur  un  avion  de  transport  supersonique  de  2e  generation. 

Une  description  sommaire  des  solutions  etudiees  est  donnee. 


1 . INTRODUCTION 

Une  etape  importante  dans  le  domaine  de  la  vitesse  du  transport  aerien  civil  a ete  franchie  par 
CONCORDE  grflce  aux  progres  importants  effectues  dans  differents  domaines  techniques  (aerodynamique , pro- 
pulsion, structure, etc ... ) 

Ceci  est  aussi  vrai  en  ce  qui  concerne  les  commandes  de  vol  ; en  effet  des  le  debut  du  projet,  il 
est  apparu  que,  compte  tenu  des  contraintes  resultant  du  domaine  de  vol,  des  formes  aerodynamiques  et  des 
performances  de  cet  avion,  des  commandes  de  vol  electriques  s'imposaient  par  leurs  performances  pour  ob- 
tenir  de  bonnes  qualites  de  vol.  L'eiperience  en  essais  en  vol  et  naintenant  en  exploitation  commerciale, 
a parfaitement  justifie  les  choix  effectues.  Le  manque  d' experience  suffisante  das  commandes  de  vol  elec- 
triques mSme  sur  avions  militaires,  a conduit  k installer  une  commande  mecanique  de  secours.  L'etape  sui- 
vante  en  cours  d*etude  k 1* AEROSPATIALE,  consiste  k s'affranchir  des  contraintes  liees  k cette  commande 
mecanique  en  masse,  complexite  et  performance  par  1’ utilisation  de  commandes  de  vol  entierement  electri- 
ques, qui  permettront  en  particulier  de  tirer  parti  des  avantages  lies  ^ 1' application  des  concepts  CCV. 


2.  DESCRIPTION 

2.T.  COMPOSITION  (voir  figures  1 et  Z) 

Les  commandes  de  vol  assurent  le  contr&le  des  6 eievons  de  bord  de  fuite  et  des  2 gouvernes  de 
direction,  les  6 eievons  et  ces  2 gouvernes  constituent  la  totalite  des  gouvernes  de  CONCORDE.  Lea  comman- 
des de  vol  comprennent  : 

- des  servoc ommandes  de  puissance  eiectrohydrauliques 

- une  commande  eiectrique  dite  bleue  qui  est  la  commande  normale 

- une  commande  eiectrique  dite  verte,  totalement  distincte  mais  identique  k la  commande  bleue 

- une  commande  mecanique  de  secours 

- un  ensemble  mecanique,  hydraulique  et  eiectrique,  utilise  dans  les  3 modes  de  commande  precedents 
et  comprenant  les  organes  de  pilotage,  leur  conjugaison  et  le  systems  de  restitution  d' efforts 

- une  commande  eiectrique  "en  efforts"  utilisable  en  cas  de  blocage  mecanique  des  organes  de  pilo- 
tage 

- un  systeme  de  stabilisation  artificielle  de  I’avion 

- un  systeme  de  protection  contre  les  incidences  excessives 

- un  trim  eiectrique  servant  notamment  k la  restitution  de  stabilite  statique. 

2.2.  3ERV0C0MHANDES  DE  PUISSANCE  (voir  figure  3) 

Chaque  gouveme  eat  actionnee  par  une  servocommande  double  corps  en  tandem,  k corps  mobile.  Par 
ciapet  navette,  chaque  corps  peut  ^tre  alimente  par  I’un  de  2 circuits  hydrauliques . 

Chaque  corps  comprend  : 

- un  distributeur , assurant  I'alimentation  hydraulique  de  chaque  clambre, 

- -one  servovalve 

- une  eiectrovanne  assurant  I'alimentation  hydraulique  de  la  servovalve  pr4c^dente 

- un  ensemble  hydromecanique  de  stabilisation  de  la  servocommande. 

Les  distributeurs  des  2 corps  sont  li^s  mecaniquement  I'un  k I'autre. 

2.3.  CCMMANDE  ELECTRIQUE  BLEUE  (voir  figure  4) 

Elle  est  composes  de  plusieurs  chatnes  distinctes  assurant  respectivemtnt  le  contrfile  : 

- des  Elevens  internes 

- des  Eievons  m<^dians  et  extemes 

- des  gouverr.es  de  direction. 

Chaque  chains  est  form^-e  de  d^tecteurs  inductifs  de  position  d'ergane  de  pilotage  et  de  gouveme, 
d'ampli  d'aaaerviaaement  et  d'un  enaeabla  diectrique  de  surveillance. 
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Lea  servocomoandes  sont  contr6l^ea  k partir  d'une  des  2 servovalves  mentionndes  pr^c^demment . En  cas  de 
panne  d'une  chalne,  1' alimentation  hydraulique  des  servovalves  correspondantes  est  coupee  et  la  chalne 
bleue  est  remplacee  par  son  homologue  verte. 

Notons  encore  que  1' ensemble  des  chaines  bleues  est  aliment^  en  courant  t 800  Hz  par  un  convertisseur  sta- 
tique  qui  lui  est  propre.  Un  convertisseur  analogue  aliments  1' ensemble  des  chaines  4lectriques  vertes. 

2.4.  COMMANDE  MECANIQUE  (voir  figures  5 et  6) 

Par  cables  dans  le  fuselaj^e  et  bielles  dans  la  voilure  et  la  derive,  elle  ccmporte  comme  particula- 

rites  : 

- des  verina-relais  de  profondeur,  gauchissement  *et  direction,  situes  dans  la  pointe  avant  et  desti- 
nes d'une  part  h eviter  que  les  frottements  de  la  commande  mecanique  soient  ressentis  par  les  pilotes, 
d'autre  part  a entrainer  les  organes  de  pilotage  automatique 

- des  melangeurs  mecaniques  assurent  1' addition  des  crdres  de  profondeur  et  de  gauchissement  des 
elevens  ■ 

2.5.  ORGANES  DE  PILOTAGE  ET  SYSTEMS  DE  RESTITUTION  D'EFFORTS 

Les  organes  de  pilotage  (mnnche,  volant,  pedales)  sont  de  forme  classique.  C8tes  pilots  et  copilots 
sont  lies  mecaniquement  et  entrainent  par  une  liaison  doubles  les  detecteurs  inductifs  des  commandes  elec- 
triques  bleue  et  verte  et  la  commande  mecanique  de  secours. 

Le  systems  de  restitution  d' efforts  (voir  figure  7)  introduit  au  niveau  de  cette  liaison  des  efforts  depen- 
dant du  braquage,  du  mach,  de  la  vitesse  corrigee  et  du  centrage.  Les  efforts  sont  obtenus  par  une  bielle  h 
ressort  et  I'asservissement  en  pression  de  verins  electrohydrauliques . Les  informations  necessaires  au  cal- 
cul  de  la  pression  sont  fournies  par  les  centrales  anemometriques  auz  2 calculate'ors  du  systems  qui  assu- 
rent le  calcul  de  la  pression,  I'asservissement  et  la  surveillance  des  verins.  Auciuie  panne  simple  electri- 
que  ou  hydraulique  ne  modifie  les  efforts  ressentis  aux  organes  de  pilotage. 

L'annulation  des  efforts  s'effectue  a position  constants  des  organes  de  pilotage.  En  profondeur  cette  annu- 
lation  peut  etre  obtenue  par  I'une  des  deux  commandes  electriques  ou  en  secours  par  une  commande  mecanique. 

En  gauchissement  et  direction,  la  commande  d'annulation  d' efforts  est  purement  mecanique. 

2.6.  COMMANDE  EN  EFFORTS 

Les  manches  pilots  et  copilots  sont  4quipees  d'un  moyeu  dynamometrique . En  cas  de  blocage  des  manches 
ou  des  volants,  apres  engagement  du  systeme  par  le  pilots,  les  efforts  exerces  au  niveau  d'un  volant  sont 
transformes  en  ordres  electriques  par  le  moyeu.  Ces  ordres,  requs  par  les  calculateurs  de  protection  hautes 
incidences  (cf.  ci-apres),  commandent  le  braquage  des  elevens. 

2.7.  STABILISATION  ARTTFICIELLE  (voir  figure  8) 

Ce  systeme  est  antierement  doubl4.  Cheque  demi-systeme  comprend  : 

- des  gyrometres  de  tangage,  roulis  et  lacet 

- t accelerometre  d'acc^leration  transversals 

- 1 calculateur 

Outre  les  informations  des  gyrometres  et  accelerometres,  les  calculateurs  regoiver.t  des  informations 
de  Mach  et  d' incidence. 

Cheque  demi-systeme  fournit  aux  commandes  de  vol  electriques,  des  ordres  de  stabilisation  de  1' avion,  i'am-.’- 
lioration  du  temps  de  r^ponse  et  de  coordination  de  virage.  De  plus,  par  detection  accelerometrique , il  atte- 
nue  le  derapage  transitoire  consecutif  a une  panne  de  reac'-e’ir  en  provcquant  -ui'.  braquage  automatique  de  la 
gouverne  de  direction. 

Les  calculateurs  sont  du  type  double  auto-surveille  ; chacun  peut  alimenter  indifferemment  les  chafnes  elec- 
triques  bleues  et  vertes.  Les  pannes  des  gyrometres  et  des  acc^leromotres,  ainsi  que  la  ma;eure  partie  des 
pannes  des  calculateurs,  ne  provoquent  pas  de  commutation  des  chaines  electriques  bleue  ou  verte. 


2.8.  PROTECTION  CONTRE  LES  INCIDENCES  EXCESSIVES 


Le  systeme  genere  : 

- a incidence  relativement  faible,  des  vibrations  au  niveau  du  manche  (dispositif  classique) 

- a incidence  plus  elevee,  une  alarme  immanquable  conetituee  d'efforts  alternes  violents  e‘  rapides 
au  niveau  du  manche,  Ces  efforts  n'existent  que  lorsque  le  nyinche  est  er.  position  cabre. 

- ur.  d^roulement  du  trim  fonction  de  1' incidence 


II  comiwinde  auasi  lorsque  n^cesoaire  : 

- une  augmentation  des  ordres  des  ctatilisateurs  de  profondeur  et  de  lacet 

- une  d^connexion  du  pilote  automatique  et  une  inhibition  du  ^nm  aitoaatique. 

Le  systome  comporte  essentiellement  de’ux  calculate’. rs  analogiques  autosurvei  114s . Outre  ies  foi. 
de  protection  centre  les  incidences  excessives,  cea  calculateurs  assurent  le  ‘raitement  des  crdres  de 
tage  en  effort  (Cf.  5 6.  ci-avar.t). 

2.9.  TRIM  ELECTRIQT'E 

De  nviniere  classique,  les  calculate  .rs  de  trim  elec*rique  (aj  nomlre  de  deix,  a ito-surve.  ll»^s) 
commandant  : 

- .'annulation  des  efforts  en  vol  stabilise  er.  pilotage  mnr.iel  selon  lea  ordres  p..lcite,  e:.  pil 
aitom/itique  aut omatiquement 

- la  restitution  de  la  stabilitf'  statique  rf^uSeaer.taire  er.  efforts. 

La  realisation  des  ordres  est  assur4e  par  2 moteurs  electriques  (an  en  fonctiorjioment , x'autre  en  attej 
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surveilles,  qui  aodifient  la  liaison  cinenatique  entre  manches  d’^e  part,  bielle  h ressort  et  v^rina  de 
restitution  d'efforts,  d'autre  part  (Cf.  § 5 ci-avant). 


3.  PERFORMANCES 

La  definition  dea  commandes  de  vol  du  CONCORDE  resulte  d' exigences  de  performances  particulierement 
geveres  liees  aux  caracteristiques  de  I'avion  et  a son  domaine  d' utilisation , On  peut  citer  er.  particulier 

- un  domaine  de  vol  particulierement  large  : voir  planche  9 les  plages  de  variation  d’ altitude  et  de  mach 

entrainant  des  variations  importantes  de  caracteristiques  de  I'avion  (voir  fig.  10  I'evolution  des  braquages 
par  g) 

- une  large  variation  des  marges  statiques  suivant  les  centrages  e+  les  conditions  de  vol  (Voir  fig  19)* 

3.1  . PERFORMANCE  DES  CHAINES  DE  COMMANDS  DE  GOUVERNES 

Ceci  nous  a conduit  a imposer  des  conditions  de  precision  de  commande  d'elevons  (hysteresis  = 1 5' ) 
qui  ont  pu  Stre  atteintes  grAce  A la  commande  electrique. 

En  commande  mecanique,  on  a dD  accepter  des  precisions  degradees  (hysteresis  = ± 24*)  qui  permettent  d'assu- 
rer  le  pilotage  de  I'avion  dans  le  domaine  de  vol  normal  dans  des  conditions  de  confort  degradees  et  avec 
des  consignee  particulieres  notamment  sur  les  centrages  k respecter. 

Les  dimensions  de  I'avion  et  son  inertie  sont  telles  que  les  exigences  sur  les  performances  dynamiques  sont 
en  apparence  relativement  modestes,  mais  n'ont  pu  §tre  satisfaites  que  grfice  ^ des  dispositifs  speciaux  de 
stabilisation  de  servo-commande  : 

- bands  passante  des  servo-commandes  : dephasage  ^ 2 He  en  electrique  ^ 45° 

- Vitesse  Max  sans  charge  aerodynamique  ; 28  a 40  °/s  sur  servo-commande  d'elevons,  46  a 51  °/s  sur  servo- 
commande  de  direction. 

3.2.  PERFORMANCE  DES  SYSTEMES  AUTOMATIQUES  D'AIDE  AU  PILOTAGE 

Les  systemes  d'aide  automatique  au  pilotage  ont  ete  initialement  definis  en  fonction  des  caracteris- 
tiques de  I'avion  et  des  divers  criteres  de  qualites  de  vol  trouves  dans  la  litterature.  Tres  tSt  dans  I'etu- 
de  de  I'avion  on  a mis  er.  place  des  moyens  de  simulation  sophistiques . C'est  grfice  a ces  moyer.s  qui  ont  per- 
mis  de  fairs  intervenir  les  Jugements  des  pilotes  que  I'on  a pu  definir  de  faqon  satisfaisante  ces  divers  sys- 
temes. Cette  mise  au  point  des  systemes  etait  telle  au  ler  vol,  qu'ils  ont  pu  Stre  engages  des  le  premier 
decollage,  et  leur  definition  a assez  peu  evolue  par  la  suite,  sauf  dans  certains  cas  que  nous  mentionneror.s 
ci-apres . 

3.2.1.  SYSTEMS  DE  RESTITUTION  AF.TIFICTELLE  D'EFFORTS 

Leur  definition  resulte  d'un  compromis  entre  le  souhait  des  pilotes  d'avoir  a developper  des  efforts 
faibles  dans  les  manueuvres  noroales  de  pilotage  peu  variables  avec  les  conditions  de  vol  et  de  chargement, 
et  d'autre  part  de  satisfaire  pour  la  profondeur  et  la  direction  aux  exigences  suivantes  : 

FROPONDEUR  : Effort  A 1,6  g < 230  N 

2.5  g 230  N 

DIRECTION  : Effort  ^ 900  N pour  les  braquages  limites  toleres  par  la  resistance  structurale. 

Ces  compromis  n'ont  pu  Stre  attaints  pour  la  direction  que  par  1’ introduction  d’un  2e  seuil  d* effort  fonction 
des  conditions  de  vol  un  peu  avant  le  braquage  limite  tolere  par  la  structure. 

En  gauchissement  la  structure  permet  les  debattements  maxi  de  gouveme. 

Pour  la  profondeur,  ces  exigences  ont  conduit  k des  efforts  par  g compris  entre  200  et  300  N/g  (voir  fig  12) 

3.2.2.  SYSTEME  DE  TRIM 

En  plus  des  fonctions  usuelles  de  trim  d'annulation  d'effort,  le  systeme  de  trim  a pour  fonction  de 
r^aliser  une  stability  statique  manche  libre  artif icielle . En  effet  le  CONCORDE,  en  dehors  de  1' instabilite 
statique  en  transsonique,  presente  une  stability  statique  voisine  de  1' indifference  (legerement  positive  cu 
l^gkrement  negative)  dans  la  majeure  partie  des  phases  de  vol.  Ceci,  bien  que  ne  deplaisant  pas  en  general 
aux  pilotes,  n'a  pas  4t4  accept^  par  les  autorites  de  certification,  et  il  a fallu  retablir  une  stability  sta- 
tique manche  libre  positive. 

D'autre  part,  pour  ne  pas  degrader  les  caracteristiques  de  pilotage,  cette  stabilite  en  effort  doit  rester 
toujours  moderee.  La  recherche  du  compromis  a ete  particulierement  difficile  ; en  effet  cette  stabilite  de- 
pend de  nombreux  parametres  (Mach,  Vq,  Masse,  Centrafe)  et  de  leur  combinaison,  et  les  prendre  tous  en  cempte 
aurait  conduit  k une  complexity  excessive.  Le  compromis  retenu  a donne  les  r^sultats  presentes  fig.  13. 

3.2.3.  SYSTEME  D’ AUTOSTABILISATION 

La  principale  fonction  du  systeme  eat  1' augmentaiion  de  I'amortissement  des  oscillations  de  co.rte 
periods  en  roulis,  lacet  et  tangage.  Ce  systeme  eat  k autorit4  limite  pour  des  raisons  de  securite  d'ailleurs 
plus  psychologiques  que  reelles,  ytant  donn^  le  niveau  d'autosurveillance  tres  4leve  du  systeme. 

L* optimisation  des  gains  et  constante  de  temps  des  autostabilisa tears  s'est  faite  progressivement  en  fonction 
essentiellement  des  jugements  des  pilotes,  d'abord  sur  simulateur,  et  ensuite  en  vol. 

On  a noty  quelques  diffyrences  entre  les  jugements  portys  sur  simulateur  et  en  vol  : 

- Certains  pilotes  ont  observy  en  vol  en  approche,  une  tendance  au  pompage  pilote  (pyriode  4 k 5 g)  qui, 
dans  les  mSmes  conditions,  n'a  jamais  pu  Stre  observye  sur  simulateur  ; 1 ' optimisation  des  lois  de  stabilisa- 
teur  de  tangage  pour  faire  disparaltre  cette  tendance,  a done  dG  Itre  entikrement  effectuy®  en  vol. 

- Dks  les  premiers  essais  sur  simulateur,  une  possibility  de  pompage  pilote  en  latyral  a yty  observye  en 

transsonique,  due  principalement  au  fort  lacet  direct  induit  paries  yievons  internes  yievy). 

On  a done  jou4  sur  lea  rypartitions  de  braquages  gouverne  en  gauchissement  entre  les  differents  yievons  et 
sur  les  lois  d'autostabiliaation  pour  yiiminer  le  problkme.  Ce  phynomene  a yty  aussi  observy  en  vol,  mais 
avec  un  degry  de  syvyrity  nettement  infyrieur,  ce  qui  a conduit  k des  ryglages  diffyrer.ts  de  ceux  initiale- 
ment  proposys  sur  simulateur. 
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L* autostabiliaateur  de  tangage  a aussi  et4  utilise  pour  amortir  le  Ter  mode  syaetrique  de  structure  fuse- 
lage (1,5  a 2 Hz),  ce  qui  n'etait  pas  prevu  dani:  la  definition  initials.  En  effet,  les  premiers  vols  effec- 
tu^s  en  forte  turbulence  avaient  fait  apparaitre  un  niveau  d’inconfort  gSnant  au  poste  de  pilotage  dd  a 
ce  mode.  L*amortissement  du  mode  a ete  augments  artificiellement  (au  moins  double)  par  un  positionnement 
judicieui  du  gyrometre  et  une  adaptation  du  filtrage  pour  eviter  le  couplage  avec  d'autres  modes  de  frequen- 
ce plus  elevee.  En  fait  c'est  la  stabilite  d'un  mode  cree  par  le  systeme  (0,8  a 1 Hz)  qui  limite  les  gains 
possibles. 

L* influence  sur  les  caracteristiques  dynamiques  de  I'avion  du  systeme  de  stabilisation  avec  ses  reglages  de 
aerie  apparait  sur  les  figures  nP  1 4 ^ 17. 

3.2.4.  SYSTEME  DE  PROTECTION  ANTI  HAUTES  INCIDENCES 

Le  systeme  a ete  regie  d'une  part  pour  permettre  les  evolutions  reglementaires  aux  vitesses  d'ap- 
proche  et  de  decollage  (l,5  g ^ ^REF  ^ g ^ ^2  et  1 ,6  g a ) et  d'autre  part  pour  eviter  que 

des  '"anoeuvres  brutales  effectuees  k ces  vitesses,  pratiquement  application  tres  rapide  des  braquages  maxi- 
maux,  ne  conduisent  h.  exceder  1' incidence  maiimale  autoris^e  de  19»5°. 


4.  SECURITE 

4.1.  OBJECTIFS 

a)  aucune  panne  de  probabilite  A 10  ^/h  de  vol  ne  doit  amener  une  degradation  notable  des  qualites 
de  vol  ou  augmentation  des  charges  de  travail  de  1' equipage 

b)  les  pannes  mettant  I'avion  dans  une  configuration  critique  doivent  §tre  Extremement  Rares  (proba- 
bilite < T0“7/h  de  vol) 

c)  les  pannes  mettant  I'avion  dans  une  configuration  catastrophique  doivent  Itre  Extremement  Improba- 
bles  (probabilite  ^ 10"9/h  de  vol). 

4.2.  MISE  EN  OEUVRE 

Le  systeme  de  comaande  de  vol  decrit  au  § 2 a ete  des  sa  conception  etudie  pour  satisfaire  aux  objec- 
tifs  ci-dessus.  Pour  cela  les  principes  suivants  ont  ete  mis  en  oeuvre  : 

- REDONDANCE 

Par  exemple  chaque  gouveme  peut  §tre  commandee  par  3 voies  independantes . De  plus,  la  plupart  des  pannes 
simples  n'affectent  que  les  Elevens  internes  ou  les  elevens  extemes  et  medians.  Les  systemes  automatiques 
d'aide  au  pilotage  sont  doubles. 

- AUTO-SURVEILLANCE 

Chaque  chalne  de  pilotage  ou  d'aide  automatique  de  pilotage  a la  capacite  de  deceler  ses  propres  pannes  avec 
une  probabilite  tres  elevee,  de  faqon  A pouvoir  en  cas  de  panne  passer  automatiquement  le  relais  a une 
chains  saine.  II  faut  noter  que  la  commande  mecanique  de  secours  ne  joue  aucun  rftle  dsns  la  surveillance 
des  chaines  electriques. 

- PARTITION 

Pour  pouvoir  ben^ficier  des  avantages  de  la  redondance,  une  segregation  tres  poussee  est  necessaire  aassi 
bien  au  niveau  des  alimentations  que  des  equipements  et  des  cablages.  De  plus,  pour  minimiser  1' inf luence  des 
pannes,  les  chaines  de  commandes  de  vol  avec  leur  surveillance  sont  separees  par  groupe  de  gouvernes  (elevens 
internes,  elevens  extemes  et  medians,  direction). 

- LIMITATION  D' AUTORITE 

Les  systemes  automatiques  d'aide  au  pilotage  ont  leur  autorite  physiquement  limitee  au  strict  necessaire  pour 
assurer  leur  fonction. 

En  plus  de  ces  principes,  toutes  les  regies  de  I'art  ont  ete  appliquees  avec  beaucoup  de  soin  aussi  bien  pour 
les  realisations  et  installations  des  elements  mecaniques  que  pour  les  elements  electriques  ou  electroniquea . 

4.3.  VERIFICATION 

4.3.1.  ANALYSE  DE  PANNES 

Les  commandes  de  vol,  comme  d'ailleurs  I'ensemble  des  systemes  de  I'avion,  ont  fait  I'objet  d'ana- 
lyses  de  pannes  ♦res  detainees  suivant  une  methode  developpee  h 1' occasion  du  programme  CONCORDE.  Cette  me- 
thode  consiste  k considerer  et  combiner  les  pannes  des  composants  ou  constituants  elementaires  prises  cl'.ac’ur.e 
avec  leur  probabilite  d' occurrence,  pour  determiner  la  nature  et  la  probabilite  des  divers  types  de  pannes  du 
systeme  (pannes  globales).  L'influence  des  elements  exterieurs  au  systeme  est  aussi  prise  en  compte.  Un  ex- 
trait  de  la  liste  des  pannes  globales  est  donnefitTure  t8. 

4.3.2.  ETUDE  DEC  CONSEQmCES  DE  PANT;ES 

I/es  conf igurations  de  pannes  resultant  de  I'analyse  de  s^curite  ont  ete  reproduites  sur  -u:.  simula- 
tear  de  vol  comportant  la  quasi  totality  des  Equipements  reels  de  commandes  de  vol.  Ces  pannes  ont  ete  intro- 
duitea  dans  les  differentes  conditions  de  vol  de  faqon  k pouvoir  faire  porter  par  les  pilotes,  constructeura 
et  officiela,  un  jugement  sur  la  nature  dee  consEquences  et  vEr^fier  en  particulier  qu'elles  etaier.t  en  accord 
avec  les  objectifs  de  sEcuritE.  Bien  entendu  la  validite  de  la  simulation  avait  auparavant  etE  demor.tree  par 
comparaiaon  avec  le  vol. 

A I'lssu  de  ces  essais,  un  cer*ain  nombre  de  ces  pannes  a EtE  retenu  pour  essais  en  vol,  soil  q-  e le  yugement 
Sir  simulate ir  .eoit  difficile  (problomes  de  pompage  pilote  par  exemple),  soil  que  les  consequences,  compte- 
•enu  de  la  probabilitE,  soient  suffisammer, t importantes  pour  aEriter  un  jugement  en  vol.  Certaines  config*ura- 
♦lor.s  de  parties,  en  particulier  le  pilotage  de  I'avion  en  c-‘’mnv:ndc  mEcanique,  ont  fait  ^'objet  de  tr^c  r.oi- 
breux  essais  en  vol. 

4.3.3.  EGGAIS 

Tous  lea  Equipemer.ts  et  les  systl-mes  ont  EtE  soumis  ^ des  essais  de  q-ialification  sEveres  ; environ- 
rtement,  endurance,  fiabilitE. 

L* ensemble  des  ElEmer.ts  mEcaniques  de  commandes  de  vol  assemblEs  sir  le  banc  de  commar.de  de  vol  a dE.'i^  subi  des 
essais  d'er^durance  correspondant  k 16  000  beures  de  vol. 

Le  ncabretotal  d'heures  de  vol  accumulE  sur  les  avions  prototypes,  presErie  et  sErie,  sans  incident  aajeur  de 
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cofluaandes  de  vol,  depasse  actuellement  7 000  h.  Aucune  parj^e  n'a  h.  ce  jour  necessity  le  pilotage  de 
1' avion  en  comaandes  mecaniques. 

5.  EVOLUTION 

Coaime  voulu  a I'origine,  les  commar.des  de  vol  de  CONCORDE  permettent  le  contrSle  del'avion  dans  tout  le 
domaine  de  vol  en  mode  mecanique,  done  notamment  sans  stabilisation  artif icielJe . La  perte  de  la  commande 
4lectrique  n*etant  pas  dangereuse,  sa  definition  est  assez  simple  (tres  simple  mSme  en  ce  qui  concerne  la 
partie  purement  electrique}.  Sa  certification  n'a  pas  pose  de  problemes  particuliers , ses  composants  ont  pu 
Itre  choisis  parmi  ceui  dont  les  compagnies  aeriennes  ont  une  bonne  connaissance . La  contre- partie  est  assez 
evidente.  D'une  part,  la  cohabitation  d'un  systeme  normal  electrique  avec  un  systems  de  secours  mecanique, 
cree  quelques  complications.  D'autre  part,  et  surtout  la  volonte  de  pouvoir  piloter  I'avion  en  secours  sans 
I'aide  de  1' electronique,  impose  des  contraintes  dans  la  definition  de  I'avion  qui  doit  presenter  des  qua- 
lit4s  de  vol  "naturelles"  satisfaisantes . 

Aussi  1' AEROSPATIALE,  avec  I'aide  du  Service  Technique  de  1' Aeronautique,  a-t'elle  entrepris  I'etude  de  com- 
mandes  de  vol  entierement  electriques. 

Nous  ne  nous  etendrons  pas  sur  les  motivations  de  telles  commandes,  motivations  souvent  expos4es. 

Rappelons  seulement  quelques  titres  : 

- possibilite  de  vol  a centrage  arriere  et  done  possibilite  d'augmenter  la  finesse  au  decollage,  de  diminuer 
la  trainee  d' equilibrage  en  croisiere,  de  diminuer  le  dimensionnement  de  I'empenr.age  horizontal 

- possibility  de  repartition  optimale  des  charges  aerodynamiques  de  manoeuvre 

- possibility  d'amortissement  de  flutter  de  voilure  ou  de  modes  structuraux  de  fuselage 

- possibility  de  reduction  de  la  stability  de  route  naturelle  de  I'avion  et  done  de  la  dyrive 

- facility  de  realisation  d' amortisseur  de  rafales  et  de  modulation  de  portar.ee. 

Ces  possibilites  sont  suffisammont  importantes  pour  que,  au  moins  dans  certains  programmes,  on  en  tierme  cemp- 
te  sur  la  dyfinition  generale  de  I'avion  (on  pense  que  ce  sera  notamment  le  cas  pour  la  prochaine  genera- 
tion de  transport  supersonique) . Des  lors,  ces  possibilites  doivent  §tre  evaluees  aussi  precisyment  que  pos- 
sible avant  mime  le  choix  d'un  programme. 

Evaluer  sigriifiera  determiner  les  possibilites  de  la  technologie,  definir  une  architecture  des  commandes  de 
vol,  apprecier  la  facility  de  pilotage.  Dane  le  domaine  de  I'aviation  civile,  la  conclusion  tangible  de  cette 
evaluation  sera  economique  : pour  cheque  avion  dyfini  par  sa  mission,  on  devra  juger  de  I'intyret  economique 
des  commandes  de  vol  purement  yiectriques. 

Bien  sCir,  des  lors  que  ces  dommandes  ont  une  rypercussion  importante  sur  la  definition  de  I'avion,  leur  prix 

meme  peut  ^tre  ’un  peu  secondaire.  Ce  n'est  pas  une  raison  nyarmioins  pour  le  negliger.  On  est  done  amene  a 

definir  les  moyens  minimaux  permettant  d'obtenir  sycurity  et  performance. 

Dans  cette  optique,  1' AEROSPATIALE,  apres  une  premiere  yvaluation  comparative  numerique  / ariaiogique,  a essaye 
en  iaboratoire,  plusieurs  calculateurs  numeriques.  Une  conclusion  nette  s'est  degagee  de  ces  essais  : il  exis- 
ts au  moins  un  calculateur  numerique  satisfaisant  pour  realiser  des  commandes  de  vol  purement  electriques. 
Satisfaisant  signifie  ici  : 

- la  programmation  est  suffisamment  aisee  pour  que  les  modifications  inevitables  lors  du  dyveloppement  d'un 
avion  puissent  §tre  realisyes  aussi  facilement  (modifications  simples)  ou  beaucoup  plus  facilement  (modifi- 
cations importantes)  qu'avec  un  calculateur  analogique. 

- la  programmation  est  suffisamment  simple  pour  que  des  ingynieurs  responsables  de  commandes  de  vol  puissent 
la  verifier  comme  ils  peuvent  lire  des  schemas  analogiques  avec  lesquels  elle  presente  d'ailleurs  une  grande 
analogie . 

- la  rapidity  de  calcul  perme*  tous  les  calculs  de  stabilisation  de  I'avion,  m^me  lorsqu'il  convient  de  fii- 
trer  des  rypojises  de  modes  structuraux . 

- la  fiabilite  "au  litre"  est  du  meme  ordre  que  celle  de  bons  calculateurs  analogiques  (ce  qui  represents  un 
progres  notable  puisque,  au  moins  dans  1' utilisation  considerye  ici,  un  litre  de  numerique  accemplit  nettement 
plus  de  travail  qu'un  litre  d'analogique) . 

En  aval  des  calculateurs,  les  commandes  de  vol  ont  besoin  de  servocommande.  II  s'agit  la  d'une  espece  compor- 
tant  de  nombreuses  families.  Apres  la  famille  "simplex"  de  CONCORDE  (et  "triplex"  male  a commande  mecanique 
de  I'AIRBUS),  AEROSPATIALE  a essayy  dans  ses  laboratoires , un  membre  de  la  famille  "q-uadruplex"  dont  les  per- 
formances se  sont  ryveiyes  bonnes  et  m^me  ytonnamment  bonnes  en  ce  qui  concerne  la  stability.  Cette  servocom- 
mande subit  actuellement  des  essais  d'endurance. 

En  amont  des  calculateurs,  on  se  heurte  non  plus  h de  nombreuses  families,  miiis  presque  a une  infinity  ; celle 
des  organes  de  pilotage.  Mettez  ensemble  quelques  parametres  tels  que  position,  effort,  forme,  mouvement 
contrbiys  (tangage/roulis  ou  *angage/roulis/lacet) , lois  de  pilotage  associees,  importance  accordee  au  pilo- 
tage automatique  ou  transparent,  morphologie  des  pilotes,  possibilitys  technologiques  (aussi  !)  et  sengez, 
po’ur  vous  rassurer,  qu'une  solution  sera  trouvye  puisqu'il  le  faut.  Et  ainsi,  puisque  nous  voulons  essayer 
des  lois  de  pilotage  en  vol,  un  minimanche  a yt.y  realise  et  essaye  au  simulateur  de  vol.  Bien  entendu,  il  est 
tres  bon  pour  ceux  qui  en  ont  initialement  choisi  le  type. 

Il  est  prevu  qu'un  frore  jumeau  de  ce  minimanche  sera  instaliy  dans  quelques  mois  dans  'ur.  avion  CONCORDE  pour 
expynmentation  de  lois  de  pilotage.  II  sera  accompagny  de  deux  calculateurs  numeriques  du  type  prycedemment 
mentionny.  Les  essais  permettron^  d'explorer  un  domaine  de  vol  extyrieur  au  domaine  de  vol  de  I'avion  de  syne. 

Par  I'ensemble  des  travaux  mentionnes  ci-avant,  AERCSFATIALE  se  prepare  done  k 1 ' ut ilisat lor.  de  comr.ai.des  de 
vol  purement  Electriques.  Elies  ne  const itueront  d'ailleurs  pas  une  ryvolution  mais  plut^t  une  yvolutior.  par 
rapport  aux  commandes  de  vol  du  CONCORDE  type  I de  syrie  dont,  rappelons-le,  la  comnanie  normale,  ele-tnq-e, 
a pleine  autority  et  assure  integralement  sa  propre  surveillance. 
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A HIGH-BBUABILITT,  EtCH-niTIGHlTY  fLIGHT  CCHTROL  SYSTai  ROR  HKLICOPTSHS 
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SUMHART 

nila  ohaptar  briefly  daaorlbaa  aoaa  of  tha  opeiatlona  ahloh  hellooptara  aay  ba  raqulrad  to  carry 
out  at  nl^t  and  In  poor  rlaiblUty.  Baoauaa  of  tha  raxy  hl^  pilot  aox)c  load  llkaly  to  arlae  In  theaa 
altuatlooai  It  la  arguad  that  the  hallooptar  ahould  be  equipped  alth  an  autoatablllaatloo  ayataa  haring 
a dafaot-aurrlral  capability.  Ona  ayataa  to  aeat  thla  requlraaent  together  alth  quantltatlra  ayataa 
reliability  and  Integrity  requlraaenta,  haa  been  deralopad  and  aanufaoturad  by  Staltha  Induatrlaa  Ltd  for 
a Saa  Elng  hallooptar  at  tha  Royal  Aircraft  Eatabllahaeot , Pamboroo^;  flight  trlala  hare  Juat  coaranoed. 
n>la  ayataa  la  triplex,  alth  digital  ooaputatlon  and  haa  the  daraloiaiant  potential  to  Inoluda  autopilot 
faoUltloa,  more  aophlatleatad  oontrol  taohnlquaa  and  extandad  arataa  radundanoy.  'Ria  redundancy 
phlloaopisy  and  tha  approach  to  aaaeaaaant  of  syatea  reliability  and  integrity  are  daaorlbed,  together 
alth  aallant  daaign  and  anginaarlag  datalla  of  tha  ayataa.  Alao  an  indloatloo  la  glran  of  future  tranda 
In  tha  technology. 

1 UTTHOroCTIOR 

Hellooptara,  T/STOL  and  CTOL  alroraft,  aapeoially  tha  latter,  bare  for  aany  yaara  been  operating 
auooaaafully  at  nlg^t.  Por  aany  CTOL  alroraft  thla  haa  Inoludad  poor  riaibllity  operation,  hoaerar 
apart  froa  apeoialiaad  ASV  rahlolaa,  tha  majority  of  hellooptara  are  atlll  Halted  to  good  aaathar 
oondltlona  and  partlonlarly  good  riaibllity.  Tha  raaaona  for  thla  lack  of  progreaa  are  many. 

What  Juatlfiaa  the  onrrant  aotlrlty  In  thla  field  la  largely  tha  Increased  emphasis  on  operations 
ahloh  hellooptara  are  now  being  required  to  carry  out  under  IPR  oondltlona.  These  are  operations  ahloh 
not  many  years  ago  aould  not  hare  been  oootemplated  In  oondltlona  other  than  good,  day-tlM  risibility. 
Thera  la  noa  a real  possibility  that  alth  the  help  of  daralopmants  In  alsotronlo  rislon  aids,  llidtad 
risibility  operations  oan  be  attempted. 

It  la  tha  ala  of  this  paper  to  postulats  some  implloatlona  on  tha  flight  oontrol  system  of 
operating  hellooptara  at  night  and  In  poor  risibility.  'Bie  prlnolpal  requlxenants  for  the  fll^t 
oontrol  aystaai,  to  allerlata  the  piloting  problems  In  these  conditions  alll  be  dlscuassd  and  tha  asoond 
pert  of  the  paper  dssorlbas  one  partloular  aolntlon  to  thsaa  requlraaenta,  naasly  tha  use  of  a triplex, 
digital  autostabillsar  ourrantly  under  flight  trials  in  a Sea  King  halleopdar. 


2 HKLICOFTER  OI®HATIC«S  IN  POOH  VISIBILnT 

In  tha  UK,  It  is  probably  true  to  say  that  the  Army  Air  Corps  hare  tha  most  pressing  requiraaants 
to  extend  their  operations  at  night  time  and  in  poor  risibility.  Any  hellooptara  are  amployad  in  aany 
roles,  moat  of  ahloh  require  axtranaly  loa-lerel  flying  and  agility  of  manoauvre  at  some  stage  of  tha 
alaalon.  It  la  these  two  faaturaa  ahloh  alll  undoubtedly  glre  rise  to  problems  in  conditions  other 
than  clear  daylight. 

Up  to  the  present  time,  the  Army  (llght/aedlna)  helloopters  hare  not  been  equipped  alth  auto- 
stablllsatlon.  The  Introduotion  Into  serrlos  of  the  WC13  'Lynx'  marks  the  change,  alth  the  oomprehenslre, 
duplex  ARCS,  fitted  to  all  rarlants. 

The  Royal  Wary  on  the  other  hand,  has  for  aany  years  been  successfully  operating  helicopters  orer 
the  sea  at  nl^t  and  in  poor  risibility.  Wasp,  Wessex  and  Sea  King  helicopters  are  all  equipped  alth  an 
AKS,  for  the  ASW  role  the  Wsasex  and  Sea  King  equlfaent  Includes  the  facility  to  descend,  transition 
and  horsr  antoaiatloally  at  rery  loa  helots.  With  the  exoeption  of  the  duplex  system  In  the  Wessex  Mk.3 
(and  tbs  Ijrax,  ahen  In  serrlos)  a simplex.  Halted  authority  APCS  has  alaays  been  fitted. 
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Tbw  IKS  fitted  to  Royal  Marine  CcMando  helloopters  has  not  differed  from  that  fitted  to  the 
oorrespondlng  Royal  Rery  alroraft.  Tha  most  demanding  Coaando  role  In  teiaa  of  nlght/poor  risibility 
opermtloo  la  the  amphibious  assault,  since  onoe  again  It  rmqulroa  rery  loa  flying  orer  both  sea  and  land 
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It  la  apparent  that  oany  of  the  problems  associated  alth  operation  at  night  and  in  poor 
visibility  arise  from  the  helloopter's  loe  Inherent  stability  ehloh  requires  continuous  monitoring  of 
vehicle  attitude  together  elth  frequent  control  aotlon«  especially  In  turbulent  conditions.  In  clear 
eeather.  much  of  the  required  attitude  Infoioatlon  is  derived  from  extenal  references  and  this  action 
Is  easily  Integrated  elth  other  visual  functions  both  Inside  and  outside  the  oookplt.  Hoeever,  even  In 
'limited*  Instrument  fllg^  (sagi  cruising  at  altitudes  sell  above  obstacle  helots)  the  tasks  of 
oontlnuously  monitoring  and  maintaining  attitude  from  an  artificial  horizon,  monitoring  other  flight, 
engine  euid  systems  instruments,  operating  radios  and  navigating,  present  a very  full  eork  load.  If  In 
addition,  operations  are  to  be  extended  to  enoosipaas  some  of  the  previous  loe-level  roles,  for  ahlch  It 
Is  essential  for  the  pilot  to  'see'  the  ground  ahead,  then  the  problaai  Is  even  more  aoute. 

It  Is  likely  that  the  first  viable  means  of  providing  the  required  foraard  vision  alll  be  a TV 
sensor  alth  a head-dosn  CBT  presentation  to  the  pilot.  In  fact,  with  the  exception  of  night  goggles  or 
other  helmet-mounted  devloss,  head-down  CBT  presentation  of  outside  world  Information  oould  be  the 
common  feature  of  the  various  sensing  system  currently  under  development.  It  Is  self-evident  that  none 
of  these  displays  can  matoh  the  total  capabilities  of  the  human  eye  In  good  visibility  and  In  particular, 
a coatproBlsa  must  always  be  sought  between  resolution  and  field  of  view.  Thus  In  addition  to  the  woric 
load  briefly  described  already,  must  be  added  the  task  of  closely  monitoring  the  CBT  in  order  to  interpret 
ground  features,  detect  obstacles  (power  cables  In  particular)  and  targets.  Judge  clearanoe  heights  and 

so  on.  In  fact  this  task  alone  oould  occupy  a major  portion  of  the  pilot's  time  since  the  quality  of  the 

CHT  image  will  be  suoh  that  close  and  oontinuous  scrutiny  will  bo  required  as  opposed  to  the  clear 
visibility  situation  where  a qulok  scan  au^iantad  by  peripheral  vision  capability  Is  adequate. 

In  order  to  reduce  the  work  load  to  a tolerable  level,  the  benefits  of  automatic  stabllisatlcn  must 
be  exploited.  By  providing  autostablllsatlon  in  pitch,  roll  and  yaw,  the  pilot  Is  relieved  of  the  task  of 
making  continuous  control  adjustments  to  correct  for  short  term  disturbances,  the  handling  Is  improved  and 

a 'hands-off'  capability  Is  acquired  when  flight  oondltiona  allow  this.  Further  gains  can  be  made  by  the 

addition  of  autopilot  modes  to  the  baslo  stabiliser  but  the  selection  of  the  appropriate  modes  requires 
careful  consideration  In  the  strict  context  of  the  roles  previously  described  since  the  simpler  modes 
(suoh  as  altitude  hold  and  acquire)  a^y  be  of  limited  use  In  those  regimes  where  the  work  load  Is  highest 
Approach  modes  and  transltlon/hover  modes  (for  Naval  applications)  oonstltute  notable  exceptions  however. 

The  mazlmum  reduction  In  work  load  can  only  be  achieved  If  the  pilot  has  considerable  oonfldenoe  In 
the  autostablUser  and  Its  Integrity.  Several  factors  affeot  the  confldenoe. 

(l)  the  behaviour  of  the  system  and  aircraft  following  an  autostablUser  failure.  This  can  be 
separated  Into  the  Immediate  oonsequenoe,  l.e.  the  severity  of  the  transient  dlsturbanoe  In 
height,  speed  and  attitude  and  than  the  ensuing  degradation  due  to  reversion  to  the  unstabillsed  helioopter, 

(li)  the  probability  of  a failure  occurring  during  the  mission, 

(ill)  the  ezistanoe  or  otherwise  of  a warning  Indication  of  the  failure  occurrenoe. 

The  Important  question  of  Integprlty  Is  discussed  later  but  In  the  present  context  It  must  be 
stated  that  a viable  autostablUser  for  nl^t  and  poor  visibility  operations  must  meet  the  requirements 
Implied  In  the  above  factors. 

3 AFCS  BEqpiBEMENTS  FOB  HIGHT/POOB  VISIBILITT  OFERATI® 

In  this  section,  some  requirements  of  an  AFCS,  relevant  to  poor  visibility  operation  of  helicopters 
In  the  preceding  roles,  are  discussed  briefly. 

3.1  Handling 

Since  this  Is  the  Justification  for  incurring  the  oost/wai^t  penalties  of  an  AFCS,  the  benefits 
derived  must  be  substantial.  Beoent  trends  and  future  requirements  In  helicopter  design  and  operations 
(suoh  as  poor  visibility  operation,  long  sorties,  single  pilot  operation,  higher  speeds  and  Increased 
agility)  depend  on  the  enhanced  flying  qualities  derived  from  autostablllsatlon.  Satisfactory  handling 
la  achieved  by  the  use  of  relatively  simple  and  well  established  control  techniques  of  attitude  and  rata 
feedback.  There  has  been  little  requirement  and  oonsequantly  little  effort  (compared  to  fixed  wing)  to 
develop  more  sophletloated  control  techniques.  However,  trends  towards  Increased  agility  and  the 
Increasing  use  of  the  helioopter  as  a weapons  platform  will  transform  this  situation. 

3.2  Integrity 

Until  more  experience  Is  gained  In  the  nlgfat/poor  visibility  operation  of  helicopters  In  roles  such 
as  those  described  previously.  It  will  not  be  possible  to  say  how  dependent  the  mission  will  be  on  the 
functioning  of  the  AFCS.  It  Is  suggested  that  many  missions  oould  still  be  oompleted  In  the  event  of  an 
AFCS  failure,  albeit  with  greatly  Increased  work  load  and  degraded  performance.  Unfortunately,  the 
performance  degradation  particularly  as  measured  against  the  requirement  to  avoid  detection  may  well 
decrease  the  probability  of  surviving  the  mission. 

What  are  oruolal,  however,  are  the  failure  transient  oharaotsrlstlos  of  the  system,  partloularly  In 
the  low-flying  situation.  If  the  pilot  Is  aaare  that  a single  defect  In  the  system  can  result  In  rapid 
and  dangerous  ohangea  in  attitude  and  hal^t  umlert  hs  Intervenes  Immediately,  then  the  oonfldenoe  that 
Is  so  necessary  for  the  full  benefits  of  the  system  *o  be  realised  will  be  lost. 

An  AFCS  runaway  la  a simplex  system  would  almost  oertalnly  be  unacceptable,  even  with  a modest  AFCS 
authority,  therefore  some  dsgre«  of  r^undanoy  Is  essential.  Although  a duplex,  fall-soft  configuration 
offers  substantial  improvements  over  a simplex  system  there  Is  a strong  case  In  favour  of  a three-lane 
system.  The  duplex  system  Is  oapabls  of  providing  a failure  warning  Indioatlon  and  prodnoes  tolerable 
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fallar*  tranalants  In  naiijr  altuatlons  liut  it  niat  ba  left  to  tha  pilot  to  deelda  on  and  isolate  the  faulty 
lane,  possibly  in  a high  noric  load  situation.  i three-lane  aysteo  on  tha  other  hand  can  be  designed  Tlth 
negligible  failure  tranalenta  and  both  detection  and  leolatlon  of  the  first  defect  or  its  suppression  is 
rapid  and  autonatlo.  The  term  "three-lane"  is  used  rather  than  "triplex"  to  permit  the  latter  to  stand 
for  the  type  of  parallel  operation  in  which  a decision  that  a lane  ceases  to  contribute  to  the  system 
output  is  based,  substantially  if  not  wholly,  on  the  relatlwe  behsTlour  of  all  three  lanes. 

3.3  Authority 

Adequate  stabilisation  is  sohlSTSd  on  current  helicopters  with  modest  authorities  of  the  order  of 
109^  When  radically  different  TShlcle  designs  are  contemplated  and  the  requirement  arises  for  rerised 
control  teohni<iies  (s.g.  manoeuTre  demand),  authorities  of  this  magnitude  will  no  longer  suffice.  Higher 
authority  would  of  course  demand  consideration  of  the  orerall  system  characteristics  including  power  and 
hydraulic  auppllea. 

3.4  Test  and  malntenanos 

0ns  of  the  arguments  against  additional  AFCS  oomplezlty  implied  by  redundancy  of  equipment 
is  that  it  neossaarlly  deoreases  the  owerall  reliability  of  the  system  in  terms  of  IDBS  and  consequently 
increases  the  maintenance  man  hours  per  flying  hour.  In  addition,  mnltllane  systems  can  suffer  from  the 
problem  of  'nulsanoe  disoonnsots'  arising  from  insrltable  equipment  tolerances.  These  can  giTS  rise  to 
reduced  system  redunanoy,  with  no  defect  rewealed  on  inwestigatlon. 

Howewsr,  there  is  such  that  can  ba  dona  to  minimise  tha  problems.  The  inclusion  of  built-in-test 
equlpsient  can  prorlde  a comprehanslTS  pre-flight  check  with  a GO/BO-00  indication  of  system  fUnctionabillty 
and  redundancy.  Likewise,  a continuous  in-flight  monitoring  can  prowlde  a hl^  degree  of  both  fault 
detection  and  location  down  to  LBH  level.  Finally,  the  same  facility  can  provide  a first-line  test 
oapabillty  without  the  need  for  additional  test  equipment  (in  tha  mnltllane  configuration). 

By  suitable  design,  the  ocourranoe  of  nulsanoe  disconnects  can  be  oonslderably  rsducsd,  especially 
if  the  system  is  implemented  digitally,  in  which  case  built-in-test  can  be  Incorporated  with  fewer  additional 
components  than  in  an  analogue  system.  Also,  for  given  component  tolerances,  the  comparison  of  llke-wlth- 
liks  in  a multiplex  configuration  is  less  susceptible  to  nuisance  disconnects  than  the  self-monitoring  in  a 
QUltlplloats  configuration. 

3.5  <^OBt  and  weight 

There  is  little  of  a general  nature  that  can  be  said  about  this,  other  than  the  obvious  desire  to 
minimise  both.  Harely  is  something  gained  for  nothing  and  it  mist  be  accepted  that  if  a hl^  reliability, 
high  integrity  AFCS  can  substantially  Increass  the  feasibility  of  safe  nl^t/poor  visibility  operation 
than  the  increase  in  cost  and  weight  of  the  system  is  the  price  to  be  paid. 

4 THS  HAS  (FLIGHT  STSTBG)  SEA  EIHG  PBOGBAlOa 

4.1  General 

A Sea  King  Hk.  1 helicopter  is  cirrently  being  used  at  HAS  for  trials  of  helicopter  avionic  systems 
and  is  being  equipped  with  a range  of  displays,  controls,  navigation  and  other  equipment.  The  osntrsl 
theme  of  the  programme  is  the  need  to  develop  the  various  equipments  to  meet  the  total  requirement  for 
ni^t/poor  visibility  operation.  In  addition  it  can  be  used  to  svaluats  equipment  and  procedures  for 
helicopters  being  introduced  into  service  and  to  investigate  ad  hoo  problems  of  In-ssrvlcs  hsliooptrrs. 

4.2  Controls  programme 

The  long  term  AFCS  requirements  of  HK  military  hsllooptera  are  under  oontinuous  review,  in  order  that 
the  work  centred  on  the  Sea  King  oan  be  aimed  to  meet  them.  Two  areas  in  whlob  fhture  development  is  seen 
to  be  most  likely  are  the  introduction  of  a greater  degree  of  system  redundancy  and  the  trend  towards 
digital  implementation  of  systems. 

With  this  in  mind,  a speolfioation  was  Issusd  for  a dsfsct-survlval  autostabiliser  (ijSAS)  to  be 
developed  for  installation  and  fllgdit  trials  in  the  RAE  Sea  King.  It  was  made  clear  that  a digital 
solution  would  be  preferred  in  order  to  encourage  development  in  this  area  and  to  gain  some  early  flight 
experience  of  digital  techniques  applied  to  flight  control  systems. 

At  the  same  time  as  meeting  these  research  objectives  it  was  felt  that  defect-survival  stabllsatlon 
was  desirable  in  the  RAK  Sea  King  for  carrying  out  other  areas  of  the  overall  programme.  The  contract 
was  awarded  to  the  Aviation  Division  of  Smiths  Industries  Ltd.,  the  equipment  wae  delivsred  at  the  end  of 
197^,  and  flight  trials  have  Just  oommenoed. 

4.3  Partloular  requirements  of  Sea  King  DSAS 

The  performance  requirements  for  the  DSAS  in  terms  of  attitude  holding,  stability,  response  to  pilot 
demand  eto.  are  such  that  they  can  be  met  in  a conventional  manner  using  attitude  and  rate  feedback.  No 
new  performanos  orltsrla  involving  additional  sensors  and  more  sophisticated  control  philosophies  were 
required  for  the  initial  part  of  the  trials,  but  will  be  considered  subsequently. 

In  terms  of  reliability  of  the  multiplex  mods,  it  is  specified  that  any  single  defect  of  the  system 
shall  have  negligible  effect  on  the  autostabllleatlon  performance.  In  particular,  the  maxlsun  transient 
deviations  following  a single  defect  in  any  axis  during  trlaed,  'hands-off  fll^t  should  not  differ  from 
the  nominal,  trimmed  values  by  mors  than  the  following  amountsi- 
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Th*  pilot  should  hs  lalomsd  of  the  ocourrooos  of  the  first  dsfsot  and  should  be  sbls  to  ass 
nanual  oontrol  of  the  aircraft  In  the  srsat  of  a sjrstes  failure,  alloslnj;  for  an  Intervention  ties  of 
1.3  a.  Other  aspects  of  the  Integrity  of  the  atultlplez  nods  are  that  the  total  risk  of  aailtlpls 
defeotSi  ehloh  result  In  deviation  In  tso  or  thres  axes  froa  the  noalnal,  trlmaad  values  szossdlng  tbs 
following  aaounts,  should  be  less  than  10~'  psr  flying  bonri- 


The  reliability  of  the  iTCS  eorzently  fitted  In  the  Sea  King  Is  good,  and  one  ef  ths  design  alas 
for  the  SSAS  la  to  aohlevs  alallar  standards  of  zsllabillty  whilat  at  the  aane  tlae  effecting  elgnlfloant 
iaprovsBenta  In  alselon  success  by  ensuring  a lower  probability  of  losing  the  autostablUsation  faollityi 
specifically  an  MTBB  In  excess  of  400  flying  hours  and  an  MTBr  of  mors  than  ICr  flying  hours  In  the 
aultl{lex  node.  nie  test  faclUtiss  outlined  In  section  3.4  fora  part  of  the  DSAS  specif loatlon. 

In  order  to  aalntaln  as  short  a dSTSlopaent  tlaesoale  as  possible,  so  as  to  acquire  early  flight 
experience,  the  original  rsqulrsawnt  to  oarry  the  defect  suzrlTal  philosophy  Into  the  actuation  arsa 
has  been  postponed.  Therefore  ths  S6AS  Is  multiplexed  to  the  extent  of  sensing  and  ecoputatlon  only. 

The  Interfaoe  with  the  existing  simplex  actuators  and  power  controls  In  the  Sea  King  will  be  described  in  | 

the  second  part  of  the  paper.  ?inally.  It  was  required  that  It  should  be  possible  to  Integrate  the  new 

autostablUsation  system  with  the  autopilot  facilities  of  the  existing  AFCS. 

4.4  DSAS  flight  trials 

After  inatallatica  of  the  aystsm  in  ths  Sea  King,  ground  oheoks  will  be  followed  by  prsllmloary 
familiarisation  flights  over  the  flight  envelope  which  will  Include  an  assessment  of  the  failure  transients 
In  Older  to  clear  ths  systra  for  further  work.  The  performance  of  ths  system,  as  a stabiliser,  will  be 
evaluated  in  terms  of  stability,  attitude  bolding  and  handling  over  a range  of  turbulence  levels  and  flight 
oondltiooa. 

A detailed  study  will  be  made  of  the  systsa  behaviour  following  any  genuine  defect  and  also  specific. 

Induced,  'defects'  that  will  be  seleotsd  to  appear  In  critical  areas  of  ths  system.  Limited  access  to 
data  within  the  computers  will  be  available  for  recording  purposes,  a partleular  Interest  being  sensor 
signals  before  and  after  consolidation.  Thus  the  ooneolidatlon  prooese  can  be  assessed  and  the  comparator 
settings  Inherent  In  this  proosss  can  be  oonflzmsd,  A range  of  comparator  values  can  be  selected  In 
order  to  Investigate  nulsanoe  dlsoonneot  problems. 

Beoause  the  equlpsisnt  Is  digital.  Its  Issmanlty  to  electrical  noise  In  the  aircraft  environmeat  sill 
require  examination,  as  sill  the  actuator  response  to  quantised  Inputs  (a  separate  rig  sxsrclse  using  Sea 
King  actuators  checked  this  latter  point,  during  the  development  of  the  aystsm). 

One  feature  that  It  will  not  be  possible  to  check  In  flight  to  a aignlfloant  level  of  oonfidsnos  is 
the  overall  aystsm  reliability.  This  la  unfortunate  but  la  due  to  the  relatively  small  number  of  flying 
hours  that  can  be  aohlevad  In  any  experimental  aircraft. 

5 DIGITAL  COIlPDTBH  DBVELOFMEjrr 

Ihs  Aviation  Divlalon  of  Siltha  Industries  Ltd.,  has  been  developing  special  purpose  digital  computers 
for  airborne  applications  over  the  last  decade.  Including  those  for  the  Jaguar,  Barrier  and  ItSCA  Head-Dp 
dlspday  systems,  a duplex  digital  Installation  for  an  engine  oontrol  system  and,  more  reoently,  computers 
for  a triplex  ATCS  providing  all  of  the  autopilot  modes  required  by  present  generation  transport  aircraft. 

Including  autoiratlo  landing  In  Qatsgory  III  oondltiooa  . At  the  present  time  dsveloimeot  of  small  prooeasors 
using  LSI  technology  Is  prooeedlng  within  the  Company. 

The  ground  rig  evaluation  of  the  triplex  A7CS  aystsm  mentioned  above  was  suocesefully  completed  In 
1973,  and  this  digital  oomputsr,  the  SDC  10,  was  selected  for  further  development  for  the  Sea  King  systam. 

6 THE  hElAJHLAMCT  PHUOSOPHT 

6.1  Because  digital  ccmbutlng  teohnlques  are  to  be  used,  aelf^onltorlng  lanes  can  be  considered. 

In  particular  two  such  lanes  could  be  used  In  a parallal-duplioats  oonflguratlon,  where  this  stands  for  tbs 
type  of  parallel  operation  In  ehloh  a deolaloo  that  a Ians  oeases  to  oontrlbuts  to  ths  system  output  la 
based,  substantially  If  not  wholly,  on  Its  own  behavlonr.  However,  ths  level  of  self-monitoring  required 
is  no  high,  s.g.  99.94^  for  a lane  HTBD  of  1200  flying  hours,  that  even  Its  aohlevsment  Is  questionable,  1st 
along  the  damonstratlcn  by  failure  analysis  of  suoh  aahlevabillty.  With  ourzently  available  equlpawnt.  It 
Is  oonsldsrsd  that  tbs  triplex  oonflgnratlom  seleotsd  for  this  Installatloa  represents  the  level 


of  loduDdaooy  pporldloj^  e dafect-aurrlTal  systam  capable  of  aohleylog  the  mlealoD-auoceaa  probability, 
ahilst  alao  prorldlcg  a fall-soft  oharaoterlstlo  folloslng  a second  defect. 
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If  giving  a saxnlng  foUoslng  a single  defect  had  not  been  a requlrenent,  then  a Parallel-triplicate 
configuration  oould  have  bean  oonsldered.  The  level  of  self-nonltorlng  required  la  noe  practloable,  a.g. 
93jt*for  a lane  KFBC  of  1200  flying  houre.  Eowevar  the  eozlc  Involved  In  failure  analysis  would  be  nooh 
greater  than  for  a triplex  oonflguratlon;  also  the  parallel-triplicate  configuration  would  provide  fall- 
operational  or  fall-soft  behaviour  on  only  69^of  oocurrenoes  of  two  defeota  which  nl^t  wall  be  aa 
unaoceptable  aa  a warning  on  only  93^of  ooourrencea  of  a single  defect. 

Bependenoe  on  In-lane  monitoring  would  require  each  sensor  to  be  more  complex  than  for  triplex.  In  that 
It  is  either  aelf-monitoxed  or  lots rrogatable  by  Its  oomputer  In  flight.  This  could  be  avoided  of  course 
by  cross-ccaparlng  and  amalgamating  sensor  signals  as  for  tii|d.ex,  but  one  would  then  aaorlfloe  the 
Inherent  advantage  of  a parallel-triplicate  oonf Iguratlon ; l.e.  obvious  Independenoe  of  lanes  except 
at  the  final  consolidation.  The  latter  oould  ccoprlse  fan-ln  of  all  three  lanes  to  each  actuator  coil, 
l.e.  double  fan-in  to  eaoh  actuator,  In  this  applloatlon  and  quite  possibly  would  require  long-tarm  datum 
equalisation  to  prevent  build  up  of  large  sustained  differences  due  to  sensor  tolerancee  and  'Integration' 
in  tbs  oontrol  law.  In  oomparlson,  each  actuator  coil  Is  wholly  In-lane  for  the  triplex  oonflguratlon. 

6.2  The  triplex  oonflguratlon  operates  upon  the  prinolple  of  a two  out  of  three  majority  vote,  whereby  a 
faulty  lane  Is  Identified  and  subsequently  rejected  as  a result  of  differences  In  some  speolflo  performance 
index  or  indices,  and  the  corresponding  performance  Indices  of  the  two  good  lanes.  Por  failure  detection, 
Buoh  a syatea  must  obviously  allow  the  build-up  of  seme  level  of  lane  dlfferenoes  prior  to  Initiating  a 
lane  rejection!  consequently  it  Is  necessary,  as  with  all  monitored  systems,  to  find  a oomprcmlse  solution 
which  on  the  one  hand  lueps  the  aircraft  transient  excursions  following  a defect  and  lane  rejection  within 
the  specified  levels,  and  on  the  other  hand  does  not  result  In  an  nnaoceptably  high  Incidence  of  nuisance 
lane  rejections.  Picm  this  point  of  view  the  use  of  digital  processors  In  the  multiplex  configuration  is 
particularly  advantageous  since  tolerance  effects  normally  associated  with  the  computing  functions  in 
analogue  systems  can  be  completely  allmlnated.  However,  the  failure  analysis,  see  section  13,  pesslmlstlo- 
ally,  assumed  a nulaanoe-dlsoonneot  rate  for  the  digital  processor  equal  to  Its  defect  rate. 

Integration  with  the  existing  Sea  King  power  supplies  and  actuation  system  has  prevented  the 
extension  of  the  defect  survival  philosophy  In  these  areas  during  the  present  programmsi  hoserar,  aueh 
sxtenalona  are  seen  ae  logioal  developments  for  future  systems. 

The  triplex  philosophy  haa  previously  been  euoceaefully  applied  by  Smiths  Industries  for  the  Trident 
AFCS.  This  was  the  first  (and  to  date,  l.e.  mld-1976,  the  only)  aircraft  to  be  certificated  for  Category 
in  operationa  by  the  CAA  and  this  background  of  experience  Is  now  supported  by  the  successful  Implementation 
and  tasting  of  the  triplex  digital  control  system  prevlonely  diaouased,  the  main  features  of  which  are  being 
Incorporated  Into  the  BSAS. 

7 CCOTROL  TECHSH3JES 

The  system  aohlevea  autostablllaation  of  the  helicopter  in  a conventional  manner,  namely  pitch  and 
roll  etablllaatloD  by  operating  respeotlvwly  the  fore  and  aft,  and  the  lateral  oyello  oontrol  of  the  main 
rotor  In  response  to  rate  and  attitude  error  Information,  and  yaw  stabilisation  by  operating  the  tall 
rotor  blade  angle  collectively  In  response  to  yaw  rats  sisnale.  A limited  heading  hold  facility  la  alao 
provldsd  by  operating  the  tall  rotor  in  responae  to  proportional  and  Integral  of  heading  error  signals. 

This  latter  facility  is  automatloally  out  out  whan  the  pilot  puahes  on  the  yaa  pedals. 

The  attitude  and  rata  information  required  for  the  pltob  and  roll  axes  may  be  obtained  In  a number 
of  alternative  aayet 

(l)  by  using  separate  rate  and  attitude  sanaors 

(U)  by  using  s single  sensor  to  measure  attitude  and  deriving  a rate  signal  oomponent  by  phase 

advance  teofanlquea, 

(ill)  by  using  a single  sensor  to  measure  the  rate  and  deriving  an  'attitude*  oomponent  by 
Integratioo  or  phase  lag  teohnlqu.es. 

Hslloopter  systems  so  far  designed  and  developed  have  normally  been  based  upon  either  (li)  or  (ill) 
and  for  Havel  applications,  Involving  oonaldsrabls  periods  of  loa  speed  or  hovering  flight  at  low  altitude, 
the  attitude  based  syetsms  have  generally  been  praf erred,  alnoe  such  syatams  can  provide  a stable  long 
term  attitude  datum.  Since  the  rats  gyrosoope  la  basloally  a simpler,  and  ooossquantly  a mors  reliable 
dsvlca  than  the  vertical  gyrosoops.  It  was  dsolded  to  use  the  very  high  reliability  Series  700  gas  bearing 

rate  gyrosoopes  manufsotuiad  by  Smiths  Industries  Limited  aa  the  primary  systsei  asoaora,  provldlog  both 

rata  and  abort  term  attitude  signals,  and  to  obtain  long  tan  attitude  algnala  from  the  three  vartloal  gyro- 
aoopes  fitted  as  bsslo  squl{msnt  In  ths  HAS  Sea  King  hslloopter. 

The  specified  short  tan  attitude  stabilisation  performanos  can  be  aohiered  In  the  abaenes  of  any 
vertioal  gyro  algnala.  However,  under  these  oondltlone  there  le  some  deterioration  in  ths  long  tsn 
attitude  hold  perfonsnee.  During  turning  fll^t  the  pltoh  and  yaw  rats  signals  are  combined  as  s function 
of  roll  attitude  to  mitigate  the  loss  of  hel^^t  ehloh  would  otherwise  oocur  as  a eonssqusnee  of  using 
body-fixed  rats  gyrosoopes. 


* One  of  ths  esstsptlona  Involved  In  thaes  eatimatea  la  that  a dafactlva  lane  which  dosa  not  out  Itaalf  out 
would  have  aqual  protaabllltlas  of  'poaltlva”  spurious  aotlvlty,  sparlously-aaro  aotlvlty  and  'nagatlva' 
spurious  aotlvlty. 
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The  autoBtabllleetion  eyateo  operates  through  series  actuators,  having  an  authority  limitation  of 
+ 10^  of  full  control  travel  In  pitch  and  roll  and  + In  yaw.  To  avoid  saturation  of  this  limited 
authority  during  manoeuvres  Involving  hl^  rates  and/or  large  attitude  changes,  electrical  coomiand  signals 
proportional  to  the  fore  and  aft  anu  the  lateral  displacements  of  the  cyclic  control  column  are  generated 
by  position  plck-offs  and  combined  with  the  rats  gyro  feedback  terms. 

Trimming  signals  are  also  Injected  Into  the  system  by  means  of  the  pitch,  roll  and  heading  trim 
knobs  on  the  pilot's  control  panel.  k hover  Indicator  provides  the  pilot  with  measures  of  the  pitch, 
roll,  yaw  and  collective  actuator  deviations  from  their  central  positions.  The  defective  survival  auto* 
stabilisation  system  Is  Integrated  with  the  standard  hover  indicator  on  the  Interseat  console. 

in  Initial  assessment  of  the  system  performance  and  control  law  optimisation  was  carried  out  using 
a general  purpose  hybrid  computing  facility.  These  studies  Included  an  examination  of  the  effects  of 
variations  In  the  computer  Iteration  rate,  and  of  quantisation  of  the  i/D  and  D/i  converters.  From 
these  Investigations  It  was  decided  to  Incorporate  12  bit  i/h  and  10  bit  D/i  converters  and  to  use  a 
5 3 millisecond  computer  cycle  time. 

8 mTDBES  OF  TEE  BEDOVDillCT  CORFIQUEiTIOH 

8.1  The  DSiS  syebem  compromises  the  following  unltsi 

- three  SDC  10  digital  computers 

- a triplex  rate  gyroscope  for  each  axis  of  control,  l.e.  pitch,  roll  and  yaw 

- a triplex  cyclic  control  column  position  plck-off  for  both  pitch  and  roll  control, 

- a pilot's  control  unit  incorporating  triplex  pitch  and  roll  trimming  facilities  and  a 
simplex  yaw  trimmer. 

The  system  Interfaces  with  the  following  existing  facilities  in  the  EiS  Sea  King  heliccpteri 

- three  identical  vertical  gyroscope  generating  three  wire  synchro  pitch  and  roll  attitude  sl^qials, 

- the  compass  system  providing  a three  wire  synchro  heading  signal, 

- the  autopilot  pitch  and  roll  command  signals  from  the  Uk,  31  AFCS, 

- yaw  pedal  disorlndnants  to  cancel  the  heading  hold  facility, 

- the  series  connected  pilot  and  co-pllot  out-cut  buttons, 

- the  interseat  null  indicator 

i schematic  of  the  triplex  configuration  and  Its  Integration  with  the  existing  actuation  syetem 
Is  shown  In  Fig.  1. 

8.2  Sensor  Inputs 

The  triplex  rats,  stick  plck-off  and  trimmer  signals  are  fed  on  a lane  one  sensor  to  lane  one 
computer  basis.  Following  A/D  conversion,  the  sensor  data  la  serially  orossfed  between  computers  using 
high  integrity  Interlane  data  highways  and  subsequently  amalgamated  within  each  computer  to  form  the  mean 
of  thrss  In  the  triplex  mode,  or  the  mean  of  two  in  a duplex  mode.  This  amalgamation  process  ensures  that 
all  three  digital  computers  are  processing  identical  Input  Information  and  consequently  generating  Identical 
digital  output  data.  The  attitude  Information  from  the  three  vertical  gyroscopes  used  for  long  term  pitch 
and  roll  control  Is  also  fed  on  a sensor  one  to  computer  one  basis.  In  this  case  however,  since  the  pilot 
will  normally  be  monitoring  his  attitude  Information  from  the  Ito.  1 vertical  gyro  by  means  of  hie  attitude 
indicator,  the  signals  from  this  unit  will  noimally  be  used  for  control  computations  in  all  three  lanes. 

The  signals  from  all  thrss  gyros  will  be  compared  for  monitoring  purpoees  and  In  the  event  of  a failure  In 
the  No.  1 gyro  unit,  automatic  switch  over  to  No.  2 vertical  gyro  will  occur.  In  the  event  of  this  gyro 
also  falling,  the  long  term  attitude  monitoring  capability  will  automatically  cut  out. 

8.3  Actuator  oommand  signals 

In  the  existing  Sea  King  AFCS,  the  simplex  autostabiliaation  input  sisals  to  each  auxiliary 
actuator  are  fed  as  a common  parallel  Input  Into  two  servo  valve  colls.  In  order  to  provide  a measure  of 
failure  survival  capability  with  the  existing  actuation  system  it  was  decided  to  input  each  coll  separately 
from  No.  1 and  No.  2 computers  respectively,  the  third  computer  output  being  fed  back  to  its  own  A/'D  converter 
through  a simple  reelstlvs  model  of  the  coil. 

With  this  configuration  two  reversionary  conditions  may  exist  following  the  first  defect,  namely, 

(I)  a duplex  system  If  computer  3 falls,  or 

(II)  a duplex  system  with  one  active  output  If  either  computer  1 or  2 falls. 

In  the  latter  condition,  the  overall  system  gain  will  be  maintained  by  doubling  the  gearing  In  the 
working  processors.  The  pilot  can  attempt  to  reinstate  the  triplex  situation  by  moswntarily  pushing  the 
POWER  switch  on  the  PCD  Into  tbs  forward  QISACI  position. 
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Following  total  ayettm  rejaotlon  due  to  a second  defect,  the  pilot  has  the  option  of  attempting 
to  operate  In  a elaplex  mode  on  either  lane  1 or  lane  2 by  using  a CHANNEL  SELECT  switch  on  the  pilot's 
ocntrol  unit  and  re-engaging  the  system.  For  safety  reasons  It  Is  arranged  that  It  will  be  lapoeslble 
for  the  pilot  to  engage  the  system  In  a simplex  mode  directly  from  the  POWER  OFF  ccndltlon,  and  it  Is  not 
possible  to  revert  from  simplex  to  multiplex  operation  without  removing  power  and  re-engaging  the  system. 

6.4  Computer  synchronisation 

The  three  dlglteLl  computers  operate  at  their  own  Independent  clock  frequencies  and  the  accumulation 
of  excessive  drift  between  them  Is  prevented  by  software  synchronisation  techniques  using  synchronisation 
status  pulses  transmitted  between  computers  along  dedicated  sync-stat  hl^ways. 

The  synchronisation  routine  Is  divided  Into  two  sub-routines,  a 'coarse'  synchronisation  which  pulls 
the  computers  to  within  a few  instructions  of  one  another  following  start  up,  and  a 'fine'  synchronisation 
routine  which  pulls  all  computers  Into  line.  A syhchronl satlon  window  is  dsflned  In  terms  of  programme 
steps,  such  that  If  any  one  computer  falls  outside  thle  window,  the  amber  warning  in  the  pilot's  control 
unit  Is  operated.  Indicating  a non  viable  triplex  system  where  loss  of  a single  computer  due  to  a further 
defect  oould  result  In  a system  cut-out. 

8,3  Engage-dlsconnect  facilities 

The  engage-dlsconnect  system  has  been  designed  according  to  the  principle  that  a faulty  computer  1s 
not  required  to  make  any  decleions  with  respect  to  either  Its  own  or  any  other  computer's  validity.  In 
the  triplex  mode  of  operation  a faulty  computer  Is  rejected  as  a result  of  validity  assessments  mads  by 
the  other  two  computers.  Disconnection  of  the  analogue  output  signals  is  achieved  by  means  of  duplicated 
logic  circuits  en  each  computer  which  contain  a pair  of  relay  contacts,  controlled  ly  validity  discriminants 
transmitted  from  the  other  two  computsre  via  dedicated  Interlane  transmission  lines. 

For  the  duplex  or  monitored  simplex  modes  of  operation,  measures  are  required  to  ensure  that  In  the 
event  of  a computer  falling  such  that  it  can  no  longer  control  the  disconnect  relay  In  the  other  computer, 
the  latter  can  nevertheless  still  be  out  out.  This  Is  achieved  by  means  of  a second  pair  of  relay  contacts 
in  ths  sngage-disconneot  logic  circuits  which  are  controlled  by  two  self  generated  validity  discriminants. 

The  pilot  Is  provided  with  the  option  of  flying  In  a simplex  mods  following  a second  defect  ly  using 

either  lane  1 or  lane  2.  This  Is  achieved  by  moving  ths  CHANNEL  SELECT  toggle  switch  on  the  pilot's 

control  unit  Into  the  appropriate  position  where  switch  contacts  will  bypass  the  engage-dlsconnect  logic 
In  the  selected  lane  and  ensure  that  there  will  be  no  output  from  the  other  lanes. 

8.6  Warnings 

The  oultlplexed  sensor  signals  are  monitored  for  excessive  differences.  In  which  event,  the  faulty 
sensor  signal  will  be  excluded  from  ths  amalgamation  process,  ths  system  oontinuing  to  operate  using  tbs 
mean  of  the  remaining  two  Input  signals.  The  AMBER  Integrity  warning  Indicator  (Fig.  2)  will  be  lit  and 
the  appropriate  LRD  symbol  displayed  on  a seven  bar  display  on  the  front  of  each  oomputsr. 

The  amber  integrity  warning  light  will  also  be  operated  as  a result  of  a lack  of  synchrcnisatlon 
between  computers  and  In  the  event  of  any  computer  power  line  relay  being  open  clroult. 

When  a complete  system  cut-out  occurs,  a RED  integrity  Indicator  on  the  PCD  Is  illuminated. 

9 INTERLANE  DATA  TRANSMISSION  SYSTEM 

Inter-prooessor  transmission  links  are  used  for  the  serial  transmission  of  digital  data  between  the 
three  computers.  This  Inter-change  of  data  la  raqulredi 

(1)  to  consolidate  the  Individual  lane  sensor  Information,  thereby  providing  common  input  data 
for  all  three  computers  In  order  to  derive  ths  maxlmim  benefits  bestowed  by  the  Identical 
processing  capabilities  of  the  three  computers, 

(2)  to  formulate  difference  signals  for  the  identification  of  faulty  sensors  or  computers  and  to 
generate  warning  and  cut-out  discriminants  when  the  oomputer  parameters  exceed  pre-defined 
levels. 

The  design  of  a high  Integrity  Interlane  data  transmission  system  Is  essential  in  order  to  preserve 
the  failure  survival  capability  conferred  by  the  triplex  redundancy.  The  tranemitter- receiver  system  must 
be  such  that  there  Is  no  possibility  of  misinterpretation  of  transmitted  data  In  one  receiving  computer 
and  correct  Interpretation  In  the  other,  due  for  example,  to  weak  transmission  from  the  sending  computer, 
since  such  a situation  may  result  in  an  Initial  rejection  of  a prefectly  good  computer  subsequently  leading 
to  a complete  system  cut  out.  Measures  to  avoid  this  type  of  situation  have  been  implemented  In  the 
transmleelon  system  and  ths  associated  software. 

Ths  practicability  of  replacing  direct  electrloal  Interconnections  between  computers  by  the  use  of 
fibre  optic  transmission  links  has  already  been, demonstrated  by  Smiths  Industries  during  the  experimental 
investigation  of  a duplex  engine  control  system^.  For  the  Sea  King  It  was  decided  to  retain  electrical 
interoonneationa  In  the  Interests  of  both  development  time  and  project  costs.  In  any  case, fully  developed 
optical  links  do  not  automat loally  suparsede  electrloal  links  because  the  latter  can  certainly  be  designed 
to  meet  current  BQ  Control  Plans;  l.s.  a choice  would  be  based  on  cost  and  reliability  comparisons. 
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10.1  Pr*-fllj^t  taatlng 

The  pr«- flight  t*at8,  which  arw  Inltlatwd  tgr  on*  aotuatlon  of  the  test  button  on  the  pilot's 
control  unit,  form  an  essential  process  in  the  aohiSTSBsnt  of  the  desired  integrity,  ensuring  a folly 
operational  systea  prior  to  talca-off.  To  avoid  inadvertant  in-flight  operation  of  the  test  button,  this 
is  covered  hy  a spring  loaded  hinged  flap.  The  face  of  the  test  button  is  divided  into  an  upper  whits 
panel  displaying  the  legend  TBSTIHG  when  lllualnated  and  the  lower  rod  and  green  panels  displaying  SO  and  CO 
reapeotively  when  lllualnated.  The  teats  are  also  terainated  by  pressing  the  button. 

The  pre-fll^t  teats  have  been  Integrated  with  the  pilot'e  nomal  pro-fll^t  drill  and  ro(iulro  pilot 
partlolpatlon  when  the  white  TESTING  segment  of  the  buttoa  is  flashing.  Prior  to  these  an  autoaatio  test 
sequenoe  is  entered,  and  after  thea  the  green  GO  panel  is  illuainated  if  no  fault  is  detected,  or  the  red 
HO  indicator  is  lit  and  the  appropriate  faulty  unit  is  indicated  on  the  ooaputer  seven  bar  dleplsys. 

In  addition  to  the  rate  gyroscopes  teste  discussed  in  section  12.3  the  following  are  some  of  the 
basic  features  which  are  checked  out  during  the  testsi 

(a)  Correct  functioning  of  all  Indloator  lamps. 

(b)  Ability  to  effect  engagenant  in  the  aultiplez  nods,  'he  full  in-fli^t  prograa  being  ezsreleed 
using  'fixed'  sensor  data. 

(o)  Tracking  accuraoles  and  scale  factor  ofi 
(l)  pltoh  and  roll  stick  plok-offs 
(il)  pltoh  and  roll  trimmer  output  sigials. 

(d)  Heading  trimmer  sensitivity  and  heading  disoonneot  loglo. 

(e)  Correct  functioning  of  pilot  and  eo-pilot  autostabllissr  out-out  buttons. 

(f)  Engage  and  dlseng;age  loglo. 

(g)  Correct  functioning  of  A/H  and  D/A  oonverters. 

Additional  software  cheoke  on  the  correct  functioning  of  the  computers  are  also  carried  out, 
together  with  the  in-flight  tests  dlsoussed  below. 

10.2  In-flight  testing 

In  addition  to  those  tests  required  for  sensor  and  output  data  monitoring  purposes,  performed  each 
computing  cycle  as  part  of  the  basic  in-flight  operating  prograne,  a number  of  additional  in-flight  checks 
are  carried  out  to  provide  added  confidenoe  Inlhe  correct  functioning  of  the  oomputsrs  when  operating 
in  the  airborne  environment.  These  include  store  oheokaums,  checks  on  the  engage-dlsooDnect  programmes 
and  limiter  sub-routines. 

10.3  Installation  testing 

In  the  preflight  check  the  program  halts  only  when  the  pilot's  normal  prsfllght  drill  is  involved  and  at 
conclusion  of  testing,  so  that  the  GO  indication  has  to  be  acknowledged.  However,  system  status  Indicatlone 
are  flashed  through  in  the  automatic  test  sequence  and  the  program  has  bean  provided  with  a 'halts'  option 
covering  each  change  of  the  Amber  Integrity,  Red  Integrity,  Lane  In/Out,  or  Null  flag  indication.  This 
option  should  be  taken  up  after  any  disturbance  of  the  system  installation  so  that  the  inspector  signing 
for  the  autoatsbillser  has  detailed  evidence  of  the  whole  of  the  test  sequence;  the  latter  is  identical 
to  the  pre-flight  check  apart  from  the  additional  halte. 

The  installation  check  is  Initiated  by  operating  the  test  button  twice  within  2 seconds.  As  in  the  pre- 
flight  check,  the  whits  TESTING  segment  flashes  whsn  participation  is  required.  If  this  is  simply 
acknowledgement  of  the  sysi*m  status,  the  flashing  will  alternate  with  a flashing  GO  window  (or  flashing 
NO  window  in  the  case  of  three  tests  checking  correct  operation  of  the  NO  Indl cation);  the  inspector 
restarts  the  sequence  by  pressing  the  teat  button. 

11  TEST  HIG 

Prior  the  installation  in  the  aircraft,  the  complete  flight  systea  is  being  evaluated  on  a ground  test 
rig.  This  rig  Includes  three  engineers  rmoks,  each  of  which  will  house  one  ooaputer  and  will  contain  an 
3K  core  store  module  and  other  units  required  to  monitor  and  control  the  oomputars.  A central  oonsola 
housing  the  pilot's  control  unit,  system  sensors  and  Junction  box  ia  also  provldad  and  an  analogue  computer 
is  used  for  olosed  loop  simulation  work.  An  interface  unit  between  one  of  the  digital  computers,  the 
pilot's  control  unit  and  a general  purpose  digital  ooaputer  is  also  provided  to  facilitate  autoaatio 
processor  fault  Injaotlon  work  to  be  done. 


12  SfSTEHS  OTIT  DETAILS 

12.1  The  digital  computers 

The  oomputer  incorporates  a 16  bit  parallel  processor  with  a 3 bit  functioo  field  and  11  bits  of 
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dlract  maBory  addraeslng.  It  haa  t«D  working  xwglBtwrw  and  autoaatlo  anb-routln«  antijr.  Exiat  la 
achlarad  using  a hardwired  link  pointer. 

Sods  of  the  other  naln  features  axe  as  followsi 


Clock  rate 
Add/subtraot  tine 
Hiltlply/dlwlde  tine 
ProgresM  store 
Volatile  store 
a/B  oonwerter 

D/a  oonverter 
Input  Bultiplexer 
Output  deailtiplexer 
Update  rate 


8 MBs 

3.75  us 

15  us 

3.9  PROM 

0.79  BIPOLAR 

12  bits,  senple  and  hold 

tins  100  )is 

10  bits 

up  to  32  Inputs  (of  which  21  are  sensor  slgBsls) 
5 outputs  (of  which  3 ere  control  signals) 

55  «s 


The  Input/output  data  and  interlans  data  txanamisslons  are  carried  out  sarlalljr. 


For  the  Sea  King  application  the  coaputer  has  been  repackaged  into  a half  ATR  short  case,  fitted  eith 
a front  end  dog  house  for  the  power  supplies.  The  unit  contains  19  cards  Inoorponatlng  the  input-output 
peripherals,  Interlane  data  transniasion  system  and  the  sngaga-dlaconnect  logic  in  addition  to  the  central 
processor  unit  (CH7),  the  programme  store  and  wolatile  store. 


Hhllst  complete  segregation  between  the  three  oonputer  CRJs  can  be  claimed,  this  is  not  possible 
for  the  peripherals  InTolved  In  the  interchange  of  data  and  discriminants.  Careful  consideration  has 
consequently  been  gleen  to  the  detailed  design  of  the  peripheral  facilities  in  order  to  achieve  good 
segregation.  The  dual  in  line  TTL  packages  and  other  components  are  mounted  on  one  side  of  the  ten  layer 
multilayer  boards.  Individual  layers  of  the  boards  have  been  allocated  for  specific  functions.  Including  a 
number  of  ezolueive  zero  and  5 v-olt  supply  planes.  Minimum  physical  separation  between  individual  tracks, 
connector  pins  and  electronic  packs  associated  with  specific  functions  and  redundancy  of  earth  connections 
have  also  been  specified.  The  card  are  mounted  in  the  computer  case  as  dual  modules  with  an  ashlaa 
segregation  barrier  between  each  module. 


The  rear  fsoe  of  each  computer  Incorporates  a 32  bit  gearing  pointer,  the  connections  of  which  can 
be  modified  when  the  aircraft  is  grounded,  to  implement  changes  to  a range  of  selected  system  parameters 
such  as  gearings,  comparator  settings  etc.  Allooation  of  2 bits  to  one  specific  parameter  allows  for  the 
selection  of  any  one  of  four  pre-deflned  values  for  this  parameter. 

A seven  bar  LRU  indicator  on  the  front  of  eaoh  computer  facilitates  the  identification  of  faulty 
units  located  during  pre-flight  or  in-flight  testing.  The  Indicator  shows  a single  namber  between 
1 and  9,  thereby  identifying  a specific  unit. 


12.2  Pilot's  control  unit 


The  pilot's  control  unit  has  been  designed  to  fit  into  the  limited  available  space  on  the 
Interseat  console,  replacing  the  auto stab 11 sation  controls  of  the  normal  ARCS.  It  was  decided  to 
Incorporate  both  the  pilot's  control  facilities  and  the  engagement  state  and  warning  indicators  in  the  one 
unit,  although  it  is  appreciated  that  for  a production  system  separation  of  the  control  and  warning  functions 
ml^t  be  operationally  more  desirable. 


Poke-home  contacts  and  flying  leads  have  been  used  throu^out  the  unit  on  order  to  facilitate  easy 
malntananoe  and  careful  consideration  has  bean  given  to  lane  segregation  requirements  within  the  unit. 


The  following  facilities  are  provided! 

(a)  Power  supply  switching  to  the  three  computers  via  the  multi-pole  power  switch. 

(b)  EhgagSBsnt  dlBorlainant  toggle  pulses  to  each  computer  on  pilot  selection  of  IMGACE  via  the 
POWER  switch. 

(0)  Red  and  amber  system  Integrity  warnings  to  the  pilot. 

(d)  A CHAKNEL  SELECT  swltoh,  allowing  pilot  selection  of  WLTIPLEX,  SIMPLEX  1 or  SIMPLEX  2 modes 
of  system  operation. 

(s)  Three  lane  engagement  state  indicators  showing  lane  ID,  lane  OUT  and  power  OFF  states. 

(f)  On  pilot  selection,  vertical  gyro  reject  disciimlnants  to  the  computers  and  indication  of 
vertioal  gyro  1 and  2 states  to  the  pilot. 

(g)  On  pilot  selection,  heading  reject  discriminants  to  the  computers  and  indication  of  the  heading 
engagement  state  to  the  pilot. 

(h)  On  pilot  selection,  ground  test  discriminant  to  the  computers  and  indication  of  the  testing 
state  and  result  to  the  pilot. 

(1)  Means  of  providing  multiplex  signals  to  the  computers  from  pilot  operated  thumb  wheels  for 
pitch  and  roll  trim. 


(j)  Means  for  adding  to  the  heading  synchro  signal  a simplex  heading  trim  signal  derived  from  a 
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(j)  continued 

pilot  operated  kmob. 

12.3  Bate  sr^oacope  unit 

Identical  triplex  units  are  prorlded  for  all  three  axes  using  43°/eecond  rate  Qrroscopes,  with  a 
mounting  frame  suitable  for  installation  to  measure  rates  about  any  of  three  specified  axes.  The  three 
rate  gyroscopes  in  each  unit  are  aligned  eltb  their  axes  parallel  in  a machined  mounting  block  which  is 
fixed  to  the  cast  and  machined  mounting  frame.  A set  of  four  fixing  lugs  are  Incorporated  on  two  faces 
of  the  mounting  frame  amd  each  set  is  accurately  machined  at  right  angles  to  the  other.  All  three  gyro 
units  will  be  mounted  on  a single  borlsontal  platfonn  in  the  helicopter. 

The  rate  gyroscope  selected  for  the  system  is  the  Smiths  Industries  Ltd.  Series  7OO  miniature  gyro 
with  an  air  bearing  providing  virtually  sero  bearing  friction  w^en  running  and  coneequently  a very  high 
reliability,  with  a predicted  gyro  KTSF  of  20000  hours.  A 11  /second  maxinum  rate  version  of  this  gyro 
is  currently  being  phased  into  service  operation  in  the  BKA  Trident  fleet  as  a replacement  for  ball  bearing 
gyroscopes. 

For  the  Sea  King  inetallation,  demodulators  provide  IX:  output  aignals  fros  the  gyroscopes,  and 
temperature  oompensation  olrcuita  are  provided  for  both  gain  and  datum  characteristics  of  the  gyros. 

These  ensure  a high  degree  of  stability  over  the  operating  conditions  and  close  inter-lane  tracking  of 
the  gyro  output  signals. 

Glmbal  torquing  facilities  are  provided  in  each  rats  gyroecope  for  use  during  the  pre-flight  tests 
to  confirm  the  IVinctionablllty  and  tiacklr,g  acouracles  of  the  three  gyroscopes  in  each  unit.  The  rotor 
spaed  of  each  rats  gyro  is  monitored  by  means  of  an  optical  sensor  syetem.  This  uses  a fibre  optic  bundle, 
a light  source,  a photo  transistor  and  assoclatsd  elsctronics  to  generate  output  pulses  at  the  gyro  wheel 
speed.  Theue  pulses  are  transmitted  to  the  digital  computers  where  they  are  processed  in  order  to  detect 
a failure  to  achieve  synchronous  speed. 

12.4  Stick  position  sensor  units 

Triplex  cyollc  stick  position  sensor  units  are  used  for  both  pltoh  and  roll  control.  These  are 
located  in  the  aircraft  control  tun  such  that  identical  units  can  be  used  for  both  axes.  The  output 

elgnals  from  each  unit  are  obtained  from  three  high  precision  plaatlo  potentiometers,  each  of  which  is 
connected  to  the  input  lever  through  a clutch  assembly,  ensuring  that  the  lever  does  not  jam  as  a result 
of  a slezed  potentlosieter.  The  electrical  end  mechanical  sections  of  the  unit  are  segregated  by  a gear 
mounting  plate  and  a high  standard  of  elsctrioal  segregation  has  been  maintained  between  the  three 
potentiometers  and  asaoolatad  wiring. 

13  ASSESSHIHT  of  SYSTBl  BELIABIUTY  AND  IRTEGBlTf 

13.1  Although  the  proving  of  eoftware  oorreetneas  was  not  referred  to  in  the  RAE's  requiremente,  it  is 
thought  worthwhile  to  outline  Smiths  Industries  approach  to  this  matter;  see  section  I4.  The  approach  to 
assessment  of  the  effect  on  the  autoatabillser  of  defects  in  the  system  units  and  in  associated  equipment  is 
dleouased  below.  This  failure  analysis  was  restricted  to  the  ailtlplex  mode  (except  as  requiied  by  the 
pre-fll^t  check)  by  agreement  with  the  EAK  and  its  findings  with  regard  to  the  gystsm  reliability  and 
integrity,  ware  as  followai- 

(i)  Confirmation  that  any  single  defect  of  the  eystem  would  have  negligible  effect  upon  the  eyetea 
performanoe. 

(il)  Confirmation  that  any  lultiple  defects  whose  effect  would  not  meet  the  integrity  requirements  in 
eeotion  4.3i  would  have  a total  probability  of  leas  than  10”'  per  flying  hour.  In  fact  for 
each  relevant  fault  sequence,  regarded  as  a passive  part  followed  by  an  active  part,  the 
probability  of  each  of  Its  parts  is  remote;  i.e.  leee  than  10”^“per  flying  hour  per  sequence. 

(ill)  An  estimated  HTBF  substantially  greater  than  10^  flying  hours  for  the  initial  comparator 
Bettings;  each  setting  is  2 dsg  per  esc.  which  accommodates  all  analogue  tolerances. 

13.2  The  failure  analysis  phlloeophyi-  It  is  essential  that  a failure  asaesement,  i.e.  the  record  of  a 
failure  analysis, conveys  oonfldsnos  in  the  completeness  of  the  analysis.  In  Smiths  Industries  experienoe 
this  neosssitates  both  bottom-up  and  top-down  procedures  except  for  simple  systems  without  ICs.  Svsn  in 
the  latter  oase,  the  use  of  both  tFPea  of  proeedxrre  could  well  be  more  efficient  than  a bottoa-up  procedure 
alone.  System  reliability  and  integrity  requirements  are  expressible  qualitatively  or  quantitatively, 
i.e.  Im  terms  of  fault-toleranos  or  probability  of  affects.  However,  the  approach  to  failure  analysis  is 
eassatially  the  same,  so  It  Is  unimportant  that  for  a given  system  both  modes  of  expression  are  likely  to  be 
Involved. 

Considering  firstly  bottom-up  procedures;  for  reliability  and  integrity  requirements  in  terns  of  fault 
tolerance,  such  a procedure  nominally  relies  on  knowledge  of  evcqr  component  (including  interconnection) 
failure  mode  that  oould  conceivably  occur  in  service.  However,  for  proprietary  1C  components  such  knowledge 
is  normally,  and  is  likely  to  remain,  unavailable;  the  1C  manufacturer  being  naturally  more  interested  in 
failure  meobanlas.  Further,  the  in-service  returns  for  complex  ICs  cannot  be  relied  on  for  guidance  as  to 
conceivably-occurring  failure  modes.  For  example,  suppose  the  true  classifloatlcn  of  a particular  1C 
failure  is  'Vrong  state  table'  but  that  diagnosis  at  module  and  oomponent  levels  found,  quite  adequately  for 
the  particular  application,  the  1C  as  'Output  stuck  at  X'.  Then  the  latter  is  likely  to  be  the  in-service 
return.  When  the  requirements  are  in  terms  of  probability  of  effects  It  la  possible  that  a particular 
olaselfleatlon  would  have  an  Inslgnlfloant  failure  rate  and  tbns  oould  be  Ignored,  otherwise  the  situation 
Is  no  clearer  then  for  xequlremente  In  terms  of  fault  tolerance.  Thus  a bottom-np  procedure,  such  as  the 
usual  way  In  shloh  on  mu  Is  condnoted,  has  to  rely  to  a osrtaln  extent  in  practice  on  the  analyst 
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postulating  aosponant  fallurs  sodas  that  oould  oonoelrabljr  occur  in  serrloo. 

Conaldarlag  nos  top-dosn  proosduros>  for  rsllablllty  and  Intagrlt/  requlresents  In  tsrma  of  fault 
toleranoo,  such  a proosdurs  rallsa  on  ths  analyst  oonsldsrlng  STSiy  locality  fallura  mods  that  is  both 
possibly  rslevant  to  the  requlrssants  and  of  oonoslvabls  ooourrencs  In  aeryics  and  postulating  a 
suffloiant  set  for  the  fallurs  analysis.  Prorldsd  the  redundancy  Is  on  a regional  rather  than  a 
oosponsnt  basis  and  is  In  general  isplesented  with  Batching  segregation,  such  a task  has  been  found 
praotloable.  That  Is,  tbat  (l)  the  analyst  can  satisfy  hlsself  that  the  postulated  locality  fallura  modes 
are  a sufficient  sat  and  (ll)  the  failure  aaBesaaant  can  oonray  the  analyst's  oonfidenoe  e.g.  to  equipment 
oartlfloatlon  authorities,  principally  by  showing  the  Incompatibility  of  these  fallurs  modes  whose 
oocurrence  In  sarrlcs  is  oonoelyable  but  are  not  postulated  with  failure  modes  that  could  possibly  be 
relSTant  to  the  requirssants.  When  ths  requirements  are  In  tszna  of  probability  of  effects,  a top-dosn 
procedure  relies  on  the  analyst  considering  eysiy  locality  failure  mode  that  Is  possibly  relevant  to  the 
rsquiraasnts  and  would  have  a significant  rats  of  oeourrenoe  In  servlcs.  Thus  ths  task  of  postulating 
a "suffiolent  set"  Is  soaaidiat  eased. 

13.3  Application  of  the  failure  analysis  philosophy 

Because  redundancy  In  this  system  la  wholly  on  a regional  basis,  and  has  been  Implemented  with 
complete  segregation  (l.e.  bridging  between  redundant  regions  Is  Inconceivable)  or  with  good  segregation 
(l.e.  the  occurrence  in  service  of  bridging  between  redundant  regions  Is  Inoonoelvable)  with  only  minor 
exceptions,  a top-dosn  approach  to  ths  whole  failure  analysis  was  selected  as  being  the  most  efficient. 

Two  examples  of  locality  failure  modes  that  vers  postulated  are  Spurious  Bit  and  Open-circuit  Interconnection. 
Because  the  latter  Is  also  a component  failure  mode  there  were  no  sub-classes  but  the  former  needed  three, 
viz.  Stuck  at  Stnck  at  1 and  Soft  Output.  Also  sub-sub-classea  were  necessary  fcr  Soft  Output,  to  cover 
the  three  types  of  response  that  could  conceivably  occur  at  a fan-outi- 

Soft  Output  (fl),  l.e.  at  least  one  load  sees  the  driver  as  Stuck  at  and  the  remainder  see  It  behaving 
nomally. 

or  Soft  Output  (l),  l.e.  at  least  one  load  sees  the  driver  as  Stuck  at  1 and  the  remainder  see  it  behaving 
normally. 

or  Soft  Output  A l),  l.e.  all  loads  see  the  driver  as  Stuck  at  ^ or  1 and  at  least  one  cognlsanoe  differs 
fromlhe  remainder. 

Faulty  Processor  was  of  course  one  of  the  postulated  locality  failure  modes  and  needed  three  sub-classes, 
viz.  (a)  Program  unaffected,  (b)  Sll^t  Departure  from  Normal  Program  (l.e.  the  changed  program  length  doee 
not  desynchronise),  and  (o)  Gross  Departure  from  Normal  Program  (l.e,  synchronisation  Is  lost).  Also  (b) 
and  (o)  needed  three  and  four  sub-sub-classes  respectively,  giving  the  following  eight  types  of  defecti- 

A type  (a)  defect  Involves  one  location  In  the  data  store  being  affected;  thus  several  data 
words  may  be  spurious. 

A type  (bl)  defect  involves  non-execution  of  one  INP  instruction. 

A type  (b2)  defect  Involves  non-ersoutlon  of  one  OUT  Instruction. 

A type  (b3)  defect  Involves  one  CXIT  Instruction  being  preceded  by  a spurious  sequence. 

A type  (cl)  defect  Involves  two  or  more  OUT  Instructions,  each  one  having  a type  (ill) 

or  a type  (iv)  defect. 

A type  (c2)  defect  Involves  repetition  of  one  instruotlon. 

A type  (o3)  defect  involves  procsssor  stopped. 

A type  (c4)  defect  involves  processor  running  did. 

14  SOPTWABl  CORBECmSS 

14.1  Software  design 

In  a computer  context  the  word  software  Is  used  to  describe  that  part  of  the  system  which  Is  particular  tc 
the  application  and  which  directs  the  otherslse  general  hardware  to  fulfil  the  requirements  of  ths 
application.  Ths  term  may  be  defined  more  generally  and  mors  usefhlly  as;  Software  is  tbat  which 
structures  a set  of  Ainotlonal  elements  to  fulfil  a task  requirement.  This  lattsr  dsflnltion  allows 
the  concept  of  levels  of  software.  Any  •ystem  looked  at  at  any  particular  level  of  details  msy  be  considered 
as  a set  of  functional  elements  being  caused  to  Interact  In  a manner  to  fulfil  a taak  requirement.  Thus 
at  one  level  the  functional  elements  may  be  Individual  computer  Instructions  and  the  task  may  be  to  provide 
the  fnnotlon  of,  for  example,  a Halt.  At  another  level  the  functional  elements  may  be  functions  providing 
integration,  first  order  lag,  gain  schedule  Halt  etc.  and  the  task  may  be  to  prodde  a particular  control 
lav.  This  concept  of  levels  of  software  Is  crucial  to  a software  design  procedure  appropriate  to  the 
production  of  correct  eoftvarw. 

The  software  deslga  procedure  adopted  Is  that  which  may  reasonably  be  called  structured  and  which  takes 
place  In  a top  down  manner.  The  essential  feature  of  structured  programming  Is  the  appreciation  of  levels  of 
program  and  ths  need  at  saoh  level  to  define  precisely  the  functions  of  ths  software  modules  and  the 
Interfaces  bstvean  them.  The  highest  level  of  software  Is  ths  statement  of  requirements  for  the  system. 

This  statsaent  defines  the  way  in  vhloh  ths  systsa  Is  required  to  perform  In  the  environment  of  aircraft 
oharaoterlstlca,  aircraft  motions,  pilot  Inputs  and  hardware  failures. 
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The  first  task  In  the  production  of  a structured  progtan  Is  to  define  a enall  but  convenient  nusber  of 
functional  sub-dlvlsloos  of  the  total  task  and  to  specify  precisely  the  Interfaces  betseen  then. 

At  the  second  level  each  of  these  functional  sub-dlvlalone  le  itself  defined  In  terms  of  a small  nunber 
of  sub-fVinctlons  with  precisely  specified  Interfacing. 

Progressing  In  this  way  the  tasks  can  be  broken  down  into  a series  of  levels.  At  the  lowest  level  the 
basic  functional  elements  are  the  machine  Instructions  of  the  CPU  and  the  functions  generated  are  the 
simplest  sub-routines. 

Looking  then  at  the  structure  which  leads  in  steps  from  the  system  requirement  to  the  machine  instructions, 
at  each  step  functions  are  defined  as  combinations  of  simpler  functions  similarly  defined  at  a lower  stage. 
Thus  at  each  stage  the  basic  elements  are  the  functions  which  have  been  defined  at  the  next  lower  stage 
and  the  software  concerns  the  grouping  of  these  basic  functional  elements  to  perform  a more  complex 
function.  The  task  of  software  proving  is  therefore  at  each  stage  to  analyse  the  assertion  that  one 
partloular  combination  of  elements,  having  defined  properties, constitutes  one  particular  function  and  none 
other.  Thus  step  by  step  one  can  proceed  from  the  behaviour  of  the  system  to  the  basic  characterletios 
of  the  computing  elements. 

A typical  series  of  steps  1st 

1.  The  system  requirement  can  be  accomplished  by  performing  a given  series  of  system  tasks. 

2.  Sach  system  task  can  be  accomplished  by  performing  a particular  series  of  tasks. 

3.  Each  task  can  be  accomplished  by  performing  a particular  series  of  sub-tasks. 

4.  Each  sub-task  can  be  accomplished  by  performing  a partloular  series  of  sub-routines. 

3.  Each  sub-routine  can  be  accomplished  ty  performing  a particular  series  of  machine  Instructions. 

14.2  Software  design  analysis 

The  task  of  software  design  analysis  is  to  establish  that  the  system  task  requirement  is  achieved  by  the 
sequence  of  actions  called  for  by  the  software.  The  structured  manner  of  the  software  design  allows  a 
similarly  structured  approach  to  the  analysis  of  the  design. 

At  each  stage  In  the  sequence  idilch  links  the  properties  of  the  basic  com^tlng  elements  to  the  overall 
system  task,  a set  of  functions  Is  defined,  each  of  which.  It  is  asserted,  can  be  represented  by  logloal 
combinations  of  certain  simpler  functions.  The  task  of  software  proving  Is  to  examine  the  validity  of  these 
logical  assertions.  In  principle,  the  method  Is  the  same  at  all  levels.  The  total  logical  consequences 
of  the  proposed  combination  of  simpler  functions  Is  written  down  and  equated  to  the  logic  of  the  claimed 
function.  If  they  are  Identical  the  assertion  is  proved. 

The  difficulty  of  the  task  of  proving  depends  very  mich  on  thechazacterlstles  of  the  elements;  being  most 
dlffloult  In  those  areas  in  which  a concise  definition  of  the  taek  is  lacking,  e.g.  at  the  higher  software 
levels.  In  such  difficult  areas  the  method  of  verification  tends  to  be  by  simulation  and  test  rather  than  by 
a more  formal  paper  analysis. 

14.3  Relevance  of  the  level  of  computer  language 

As  In  the  software  design  process  successively  Icwer  and  more  detailed  levels  are  reached  it  becomes  necessary 
to  consider  the  level  of  language  In  which  the  task  is  to  be  expressed  to  the  digital  processor.  The 
principal  alternatives  arei- 

1.  Assembler  code  in  which  these  mnemonics,  or  functional  symbol,  used  have  a 1 ■ 1 relationship 
to  the  machine  code  on  which  the  processor  operates. 

2.  A high  level  language  in  which  a computer  takes  on  the  task  of  converting  functional  statements 
Into  sequences  of  machine  code  to  cause  the  desired  functions  to  be  performed. 

The  latter  has  an  imiedlats  appeal  In  that  more  'Ekigllsh  like'  statements  can  be  written,  giving  the  software 
an  appearance  of  visibility.  This  characteristic  of  visibility  Is  achieved  by  delegating  the  detailed 
processor  operation  to  be  organised  by  li>e  compiler  In  response  to  the  high  level  of  statements.  However, 
this  means  that  the  compiler  Itself  Is  an  Integral  part  in  the  chain  which  produces  the  machine  instructloms. 
In  which  case,  either  the  validity  of  the  compiler  Itself  must  be  proved  to  a level  appropriate  to  something 
which  can  have  a coherent  affect  on  the  multilane  system  or  the  machine  code  which  It  generates  must  be 
analysed  to  ensure  that  the  required  functions  are  being  correctly  performed.  The  former  Is  a task  of 
extreme  difficulty  and  the  need  for  the  latter  removes  the  apparent  advantage  of  the  high  level  language. 

The  use  of  assembler  level  programming  In  conjunction  with  a structured  programming  approach  does  not  possess 
these  disadvantages.  The  tasks  which  are  to  bm  coded  are  the  lowest  level  of  sub-routine  perfoming  simple 
functions  such  as  Limit,  First  order  lag,  Cheek  for  Identity  etc.  for  which  an  assembler  langoage  program 
is  appropriate  and  since  the  assembler  provides  1 ■ 1 relationship  between  the  symbolic  Input  and  the  single 
line  of  oode  the  problem  of  establishing  its  validity  by  direct  test  is  elmple.  I^ith  regeurd  to  visibility, 
because  the  total  task  le  struotursd  In  such  a way  that  each  assembler  routine  has  a small  well-defined  task 
to  perform  and  that  larger  functions  are  achieved  ly  grouping  sequences  of  these  simpler  tasks,  then  the 
visibility  is  at  least  as  good  as  with  a high*!'  level  language. 
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15  HJTOHB  SYSmtS  IMPROVBffiJtTB 

Th«  naad  to  provide  redundancy  for  outer  loop  control  faoilltlea  rill  depend  upon  the  type  of 
alaaton  being  undertaken.  However,  further  davelojaenta  of  the  ayeten  daeorlb^  herein  to  Incorporate 
the  autopilot  facilities  noraally  available  In  a Dodam  helicopter  and  faoilltles  for  ooupllng  to 
aodam  guidance  aids  can  be  aohiavsd  coaputsr  prograaas  extension  and  sensor  Input  Inplsnentatlcn. 

Modem  ballooptsrs  are  now  being  designed  with  semi-rigid  rotors  ^iloh  simplify  the  hub  design 
but  introduce  additional  high  speed  stablll^  problasis'^.  Future  developments  In  the  helicopter  design 
are  likely  to  be  towards  rigid  rotors,  fow  which  the  material  problems  are  understood  to  be  less  severe. 
However,  suoh  designs  will  tntroduoe  additional  speed  dependant  cross  oonpling  effects,  and  to  oounteract 
thsee  a versatile  automatic  control  system  will  be  required.  It  Is  considered  that  this  Is  more  likely 
to  be  realisable  using  digital  rather  than  analosie  taohniques. 

Rapid  developments  of  digital  devices  suitable  for  airborne  application  are  now  taking  placet  for 
szampla  the  LSI  processor  mentioned  In  section  5t  idiioh  occupies  two  boards,  has  a greater  capability  than 
the  SSCIO,  which  occupies  6 out  of  19  boards.  Also  under  development  at  Smiths  Industries  Is  the 
utilisation  of  both  microprocessor  siloes  and  'oomplets'  microprocessors.  Similar  developments  are  taking 
place  in  perlphsial  devices  such  as  A/H  and  h/a  converters,  niltlplezers,  etc. 

Development  of  fibre  optio  data  transmission  systems  and  multi-sensor  digital  data  acquisition 
systsms  with  good  BF  noles  rejection  oapabllltlee  have  also  been  carried  out  by  Smiths  Industries,  with  a view 
to  Integration  into  the  next  generation  of  flight  control  systems. 

16  COHCLDSIOHS 

This  paper  highlights  pilot  work  load  as  a major  obstacle  to  the  tactical  use  of  helicopters  at 
night  and  in  poor  visibility.  The  work  load  can  be  reduced  by  the  fitting  of  an  autoetabiliser  to  the 
aircraft  but  the  maziaum  gain  sill  not  be  realised  unless  there  Is  adequate  pilot  confidence  in  the 
Integrity  of  the  system.  A defect-survival  system  can  give  this  confidence  and  such  a system  has  been 
seleoted  for  flight  evaluation  in  a helicopter  at  the  Royal  Aircraft  Establishment.  Further  developments 
in  the  area  of  digital,  multiplex  flight  control  system  depend  largely  on  future  operational  requirements 
and  aircraft  desi^i  trends. 
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15.  Abstract 
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The  intent  of  the  AGARDograph  is  to  address  the  hardware,  software  and  man-machine  interface 
aspects  of  reliable  flight  control  systems.  Rapid  advances  in  solid-state  electronics  which  resulted 
in  a hundred-fold  decrease  in  computer  size,  power  and  cost  over  the  past  two  decades  have 
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gains  primarily  by  incorporating  additional  control  functions  to  improve  aircraft  or  weapon 
system  performance  and  survivability.  As  a result,  control  system  complexity  also  has  increased 
by  1 to  2 orders  of  magnitude,  and  highly-reliable  flight  control  system  operation  has  become 
critically  important  to  mission  planning  and  execution.  While  some  increases  in  system 
reliability  were  obtained  through  redundancy  in  system  mechanization,  concerted  efforts  aimed 
at  improving  system  integrity  were  not  initiated  until  the  late  1960's.  This  AGARDograph 
summarizes  associated  analysis,  design,  development  and  checkout  approaches. 


The  AGARDograph  is  organized  into  three  major  parts.  Part  I.  Background  and  Requirements: 
Part  H.  Analysis  and  Testing:  and  Part  HI,  Design  and  Implementation. 


This  AGARDograph  was  prepared  at  the  request  of  the  Guidance  and  Control  Panel  of 
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